Recently I managed to accidentally exhaust the pool of concurrent lambda executions on an account.
We have an S3 bucket full of files that we need to parse and store in a database. They get parsed by a trigger when added. We have added some code that can extract extra details from the files. The naive approach is to have a scheduler trigger the Lambdas at a fixed rate. The problem was that due to another problem the database became overloaded. This led to a sorcerer’s apprentice style of retry overload.
We now use reserved concurrency combined with an SQS queue to limit how many messages are processed at a time.
Reserved concurrency takes some of the concurrent lambda capacity (which defaults to 1000 per account) and restricts a given lambda to use that. I would prefer the ability to have a cap rather than dedicated capacity, but I can see why it is there.
I have taken advantage of the long weekend to tune the values of timeout and reserved capacity so that I don’t overload my Postgres database. Thread pooling is hard when you have Lambdas.
Lesson learned:
– Lambdas are not an unlimited resource.
– Don’t have too many unrelated services in the same account
– SQS plus Lambda with a dispatching lambda allows effective rate limiting.