Stop Fighting The Last War

There is a classic military problem at the start of a new campaign. The goals and missions initially attempted are based upon the doctorine that worked in the previous campaign. This can have disastrous effects. During the first Gulf War this resulted in a convoy of tanks left inoperable at the side of the road. A key filter needed to be replaced to allow use in a desert.

I am shortly to start on a new project at a new client in a very different domain to any that I have worked in before. This will involve checking all of the assumptions that I am about to make.

The previous engagement had poorly defined goals. This is one point that can be corrected early on for the next project. What is the problem we are trying to solve? How do we measure progress towards this goal? What are the constraints here?

Rate Limiting Lambdas

Recently I managed to accidentally exhaust the pool of concurrent lambda executions on an account.

We have an S3 bucket full of files that we need to parse and store in a database. They get parsed by a trigger when added. We have added some code that can extract extra details from the files. The naive approach is to have a scheduler trigger the Lambdas at a fixed rate. The problem was that due to another problem the database became overloaded. This led to a sorcerer’s apprentice style of retry overload.

We now use reserved concurrency combined with an SQS queue to limit how many messages are processed at a time.

Reserved concurrency takes some of the concurrent lambda capacity (which defaults to 1000 per account) and restricts a given lambda to use that. I would prefer the ability to have a cap rather than dedicated capacity, but I can see why it is there.

I have taken advantage of the long weekend to tune the values of timeout and reserved capacity so that I don’t overload my Postgres database. Thread pooling is hard when you have Lambdas.

Lesson learned:

– Lambdas are not an unlimited resource.

– Don’t have too many unrelated services in the same account

– SQS plus Lambda with a dispatching lambda allows effective rate limiting.