{"id":1226,"date":"2020-03-20T12:21:18","date_gmt":"2020-03-20T11:21:18","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=1226"},"modified":"2021-03-24T12:37:28","modified_gmt":"2021-03-24T11:37:28","slug":"a-comprehensive-analysis-of-aws-lambda-function-optimize-spikes-and-prevent-cold-starts","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/a-comprehensive-analysis-of-aws-lambda-function-optimize-spikes-and-prevent-cold-starts\/","title":{"rendered":"A comprehensive analysis of AWS Lambda function: optimize spikes and prevent cold starts"},"content":{"rendered":"
When it comes to Serverless, many are the aspects that we have to keep in mind to avoid latency and produce better, more reliable and robust applications. In this article, we will discuss many aspects we need to keep in mind when developing through AWS Lambda, how we can avoid common problems and how to exploit some of the recently introduced features to create more performant, efficient and less costly Serverless applications.<\/span><\/p>\n For years the topic of cold starts has been one of the hottest and most frequently debated topics in the Serverless community.<\/span><\/p>\n Suppose you\u2019ve just deployed a brand new Lambda Function. Regardless of the way the function is invoked, a new Micro VM needs to be instantiated, since there are no existing instances already available to respond to the event. The time needed to set up the Lambda Function runtime, together with your code, all of its dependencies and connections, is commonly called Cold Start.<\/span><\/p>\n Depending on the runtime you choose, this setup process could take at least 50 – 200 ms before any execution actually started. Java and .Net Lambdas often experience cold starts that last for several seconds!<\/span><\/p>\n Depending on your use case, cold starts may be a stumbling block, preventing you from adopting the Serverless paradigm. Cold Starts should be avoided in scenarios where low-latency is a driver factor, e.g. customer-facing applications. Luckily, for many developers, this situation is an avoidable issue because their workload is predictable and stable or is mainly based on internal calculations, e.g. data-processing.<\/span><\/p>\n AWS documentation provides an example to better understand cold starts issues correlated to scaling needs. Imagine some companies, such as JustEat or Deliveroo, which experience very spiky traffic around lunches and dinners.<\/span><\/p>\n These spikes cause the application to run into limits such as how quickly AWS Lambda is able to scale out after the initial burst capacity. After the initial burst, it can scale up linearly to 500 instances per minute to serve your concurrent requests. Before it can handle the incoming requests, each new instance should face a cold start. Concurrency limit and high latencies due to cold starts could make your function scaling not able to deal with incoming traffic, causing new requests to be throttled.<\/span><\/p>\n Function instances<\/span><\/p>\n Open requests<\/span><\/p>\n Concurrency has a regional limit that is shared among the functions in a Region, so this is also to take into account when some Lambdas are subject to very frequent calls. See the table below:<\/span><\/p>\n To ensure that a specific function can always reach a specific level of concurrency and to restrict the number of instances of a Lambda Function that has access to downstream resources (like a database), you can configure <\/span>reserved concurrency.<\/b> When a function has reserved concurrency enabled, it always has the possibility to raise its number of instances to the one specified in the reserved concurrency configuration, regardless of other Lambda Functions\u2019 utilization. Reserved concurrency applies to the function as a whole, including versions and aliases.<\/span><\/p>\n To reserve concurrency for a function follow these simple steps:<\/span><\/p>\n The following example shows how the reserved concurrency can help manage throttling.<\/span><\/p>\n Function instances<\/span><\/p>\n Open requests<\/span><\/p>\n Reserved Concurrency can be applied when we have a clear understanding of the maximum possible concurrency rate, but cannot avoid problems related to cold start as every new instance created to sustain the concurrency rate will incur in that problem.<\/span><\/p>\n Before AWS released the Provisioned Concurrency feature, trying to avoid or even reduce cold-start to make Lambdas more responsive, was a very difficult task:\u00a0 you\u2019ll have to rely on custom user\u2019s logic to verify if a Lambda was ready or warming up. Later on, some libraries like <\/span>lambda-warmer<\/span><\/a> and <\/span>serverless-plugin-warmup<\/span><\/a> took the spot, but still may not be considered an ideal and clean solution.<\/span><\/p>\n To enable your function to scale without fluctuations in latency, use <\/span>provisioned concurrency<\/b>. It allows you to configure warm instances right from the start and doesn\u2019t require code changes.<\/span><\/p>\n The following example shows a function with provisioned concurrency processing a single spike in traffic.<\/span><\/p>\n Function instances<\/span><\/p>\n Open requests<\/span><\/p>\n When <\/span>provisioned concurrency<\/b> is allocated, the function scales with the same burst behavior as standard concurrency.\u00a0<\/span><\/p>\n After it’s allocated, <\/span>Provisioned Concurrency<\/b> serves incoming requests with very low latency. When all provisioned concurrency is in use, the function scales up normally to handle any additional requests. Those additional requests will meet cold starts but they should be few if you have properly configured Provisioned Concurrency.<\/span><\/p>\n Thanks to this new feature it is now possible to migrate to serverless workloads that were previously difficult to migrate, such as:<\/span><\/p>\n You can configure Provisioned Concurrency for an Alias or for a Version. If you configure provisioned execution for the Alias, the associated Versions will inherit the Alias\u2019 Provisioned Concurrency configuration. Otherwise, each Version should have its own configuration. This way, it is possible to associate different Lambda Function Versions with different concurrencies to different traffic loads, not only, it is possible to do AB testing by exploiting this functionality.<\/span><\/p>\n Remember that you cannot configure Provisioned Concurrency against the $LATEST Alias or any Alias that points to it.<\/span><\/p>\n To publish a Lambda Version, move to the specific Lambda Function\u2019s details page from the AWS Lambda Console and publish a new version, clicking \u201cPublish new version\u201d from the \u201cActions\u201d drop-down menu.<\/span><\/p>\n After publishing it, the Lambda version is set:<\/span><\/p>\n At this point, you could configure Provisioned Concurrency for the newly created Version 1 but, for the sake of this article, we will explore how to configure Provisioned Concurrency for a new Alias that points to Version 1.\u00a0<\/span><\/p>\n Therefore, under the <\/span>Aliases <\/b>section, click on \u201c+ <\/span>Create alias<\/b>\u201d. This command opens a modal in which you can create a new Alias which needs to be pointed to the previously created Version 1.<\/span><\/p>\n Once the Alias has been created, you can configure Provisioned Concurrency for it.<\/span><\/p>\n Move to the \u201cProvisioned concurrency\u201d section. Here you can configure <\/span>Provisioned Concurrency <\/b>on the newly created Alias, defining a value that represents how many concurrent executions you can provide out of your Reserved pool. It\u2019s simple as that; just be aware that this consists of an additional cost as specified by AWS.<\/span><\/p>\n Once fully provisioned, the \u201cStatus\u201d column, under the \u201cProvisioned concurrency\u201d section, will change to <\/span>Ready<\/b>. Invocations will then be handled by the Provisioned Concurrency ahead of standard on-demand concurrency.<\/span><\/p>\n In the graph, we can see that the Lambda invocation is now managed by provisioned instances and if we check Cloudwatch Logs we can easily see that the first invocation reports the <\/span>Init Duration<\/b>, which corresponds to the time needed to provide the requested number of concurrent executions by Lambda, together with the standard <\/span>Billed Duration<\/b>.<\/span><\/p>\n You can see evidence of this also in the X-Ray trace for the first invocation.<\/span><\/p>\n To make a further enhancement we can provide <\/span>autoscaling for provisioned concurrency<\/b>. When using Application Auto Scaling, you can create a <\/span>target tracking scaling policy<\/b> that modifies the number of concurrent executions based on the utilization metric emitted by Lambda.<\/span><\/p>\n On a side note, since there is no clear way to delete the Provisioned Concurrency configuration from the AWS console, you can use the following aws-cli command:<\/span><\/p>\n Use the Application <\/span>Auto Scaling API<\/b> to register an alias as a scalable target and create a scaling policy.<\/span><\/p>\n In the following example, a function scales between a minimum and maximum amount of provisioned concurrency based on utilization; when the number of open requests increases, Application Auto Scaling increases provisioned concurrency in large steps until it reaches the configured maximum.<\/span><\/p>\n Function instances<\/span><\/p>\n Open requests<\/span><\/p>\n The function continues to scale on standard concurrency until utilization starts to drop. When utilization is consistently low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps (right-half of the picture).<\/span><\/p>\n Beside utilization-based scaling, AWS Auto Scaling allows you to schedule scaling actions. In both scaling strategies, you have to first register the alias as a scaling target for AWS Auto Scaling.<\/span><\/p>\n New or improved Aws Lambda metrics are now available to help define the traffic peaks as well as how your lambdas respond in terms of concurrency and simultaneous invocation in general.<\/span><\/p>\n \u2013 The number of function instances that are processing events. If this number reaches your <\/span>concurrent executions limit<\/span><\/a> for the Region or the <\/span>reserved concurrency limit<\/span><\/a> that you configured on the function, additional invocation requests are throttled.<\/span><\/li>\n \u2013 The number of function instances that are processing events on <\/span>provisioned concurrency<\/span><\/a>. For each invocation of an alias or version with provisioned concurrency, Lambda emits the current count.<\/span><\/li>\n \u2013 For a version or alias, the value of <\/span><\/p>\n divided by the total amount of provisioned concurrency allocated. For example, <\/span><\/p>\n indicates that 50 percent of allocated provisioned concurrency is in use.<\/li>\n \u2013 For an AWS Region, the number of events that are being processed by functions that don’t have reserved concurrency.<\/span><\/li>\n<\/ul>\nCold Starts<\/h2>\n
<\/p>\n
Legend<\/h6>\n
<\/p>\n
<\/p>\n
Throttling possible<\/span><\/p>\n
Reserved Concurrency<\/h2>\n
Burst Concurrency Limits<\/h3>\n
\n
\n
Legend<\/h6>\n
<\/p>\n
<\/p>\n
Throttling possible<\/span><\/p>\n
Provisioned Concurrency<\/h2>\n
Legend<\/h6>\n
<\/p>\n
<\/p>\n
Provisioned concurrency<\/span><\/p>\n
Standard concurrency<\/span><\/p>\n
\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
aws lambda delete-provisioned-concurrency-config --function-name \r\n<\/span><function-name><\/span> --qualifier <\/span><version-number\/alias-name><\/span><\/pre>\n
Auto Scaling API<\/h2>\n
Legend<\/h6>\n
<\/p>\n
<\/p>\n
Provisioned concurrency<\/span><\/p>\n
Standard concurrency<\/span><\/p>\n
Lambda Metrics<\/h2>\n
Concurrency Metrics<\/h3>\n
\n
ConcurrentExecutions<\/span><\/pre>\n
ProvisionedConcurrentExecutions<\/span><\/pre>\n
ProvisionedConcurrencyUtilization<\/span><\/pre>\n
ProvisionedConcurrentExecutions<\/span><\/pre>\n
.5<\/pre>\n
UnreservedConcurrentExecutions<\/span><\/pre>\n