My team develops and supports the systems that we work on. It is important to know what is normal so that it’s easy to see production problems before they get too serious.
One of the systems that I work on monitors a data source and sends emails out to our subscribers. I am being vague here so as to not breach client confidentiality. This system is a graph of 12 (mostly) micro-services. To know that this is healthy is a big undertaking. This is how we do this.
We have used our logging tool (DataDog) to capture the signals that we receive and the messages that we send. These are charted here on a one week scale:
The left is what we have detected and the right is what we send. The users are interested in different signals so the spikes will be of different shapes. We can see problems anywhere in the network using these two charts. The one on the right should be similar to the one on the left.
Gaps on the left will always be matched by gaps on the right. This allows us at a glance to see what is missing or abnormal. Extra gaps on the left are caused by breaks in the input feeds (which we will then check) gaps in the right are problems in processing the data.
We also keep an eye on the errors logged in the past 24 hours. The most frequent error normally requires investigation. Datadog provides a Patterns tool that helps here:
I typically try to fix the most frequent error each day. Here one of the feeds had been broken by a change on the other end.
Given the level of logging that we use I can’t remember the last time that I needed a debugger. Unit tests and logs solve this far quicker.
When I moved house I rented a van to move my possessions. I own a car but it would not have been practical to own a removal van. I don’t need a van all the time (technically I don’t need a car all the time, but do use it enough to make owning it worthwhile).
This is the model that makes sense for Serverless. For most users it would be cheaper to just rent the service when it is needed. A key point of Serverless is that you pay when you need it and don’t pay when you don’t. This can make the staging and development environment significantly cheaper without extra effort. I have worked on cloud hosted systems that were switched off overnight (and at weekends). This gave a cost saving, but if the start process failed we could be half a day without a working test environment.
Now there are cases where if you need to use a service all the time then other options become viable. You can run a server for $1 per day on Heroku.
Would Rent Infrastructure be a better name than Serverless? This could avoid the “you still have servers” debate.
Recently I have found how quickly you can stand up useful services. My team was asked to set up an sftp server. Using AWS and S3 we now have a working system 2 days after first being asked for it.
Whilst preparing my book Development I constructed a small toolchain to assemble the ePub, mobi and pdf files.
I have chosen the “Bring Your Own Book” option on leanpub to give me the maximum flexibility.
From the project files from Development I have extracted a github project that can act as a starting point for writing another book: Writers Toolkit.
Currently the scripts to setup and build are mac centric but I would welcome pull requests for other platforms.
The build tools are based upon the wonderful Pandoc. I use this to turn markdown files into ePub and pdf files. The ePub is then converted into a mobi file for Kindle.
The only issue that I have had with Pandoc is trying to convince it to correctly form P2 paragraphs. I had been using the inline ## form for this. The other option adding —- to the following line seems to be more reliable.