One of the databases had no data to migrate (the data was transitory and had no value after used).
The last three required the used of the Database Migration Service (DMS). DMS is configured to follow a database and move over any updates to the new system. This allows for minimal downtime when moving from one database to another, especially when taking backups and restoring would be prohibitive.
The DMS is very quick to use (we had a slightly slower approach as we had to use Terraform to configure it and could only start the jobs using a Jenkins task).
One flaw we found was in the error handling. One of our databases (unknown to us) contained some text fields with the null character (\u0000). This is something that MongoDB can handle that DocumentDB cannot. The migrations failed, reporting the error, but gave no clue as to where to find the problem. This was problematic as the system we are migrating has around 2 million documents.
We eventually took a brute force approach to find the problem records:
Extract each record to a single file
Read these files and write them to DocumentDB and delete those that worked.
Eventually we found the 40 problem records. These were deleted from the source system and manually inserted into the new.
Eventually we found other problems with DocumentDB that prevented us from using it for the final database (we moved it to another MongoDB provider).
We now have no MongoDB’s on the platform that is being decommissioned.
Most developers are used to being the technical support for friends and family. The latest incident that I have found was slow to fix.
My mother has a Windows 10 laptop that had started to show the “You are currently running a version of Windows that’s nearing the end of support. We recommend you update to the most recent version of Windows 10 now to get the latest features and security improvements” message.
She started to apply the suggested update, waited a while and the update uninstalled itself.
At this point I was called on to help.
I restarted the update process and it eventually prompted that a HP utility was no longer compatible and needed to be removed. This triggered a restart cycle.
At the next restart and update it suggested that freeavg needed to be upgraded or removed. I went for the simple option of uninstalling and downloading a fresh copy. Three reboots later the update manager repeated the same message about freeavg.
This was followed by an uninstall of freeavg, a restart and another trigger of the update to 1903. Again it asked for FreeAVG to be uninstalled (which was interesting as the add/remove program dialog did not include FreeAVG.
Eventually I found a freeavg stand alone uninstaller. This worked on the second attempt (hint: when it suggests booting into safe mode, it’s not kidding).
Another reboot/update cycle had the update working. It only took 3 hours from first update to fully upgraded.
I don’t know if this is expected behaviour for a mass market operating system. If I wrote code that required this level of hand holding then I would be expected to do the install myself.
How an end user that is not very tech savvy is expected to get this working is beyond me.
When I got home I looked to update my even older windows 10 laptop. The 1803 update failed with an error message to search for, and am now attempting to use the Windows 10 update assistant v1903 to get my machine updated.
My team develops and supports the systems that we work on. It is important to know what is normal so that it’s easy to see production problems before they get too serious.
One of the systems that I work on monitors a data source and sends emails out to our subscribers. I am being vague here so as to not breach client confidentiality. This system is a graph of 12 (mostly) micro-services. To know that this is healthy is a big undertaking. This is how we do this.
We have used our logging tool (DataDog) to capture the signals that we receive and the messages that we send. These are charted here on a one week scale:
The left is what we have detected and the right is what we send. The users are interested in different signals so the spikes will be of different shapes. We can see problems anywhere in the network using these two charts. The one on the right should be similar to the one on the left.
Gaps on the left will always be matched by gaps on the right. This allows us at a glance to see what is missing or abnormal. Extra gaps on the left are caused by breaks in the input feeds (which we will then check) gaps in the right are problems in processing the data.
We also keep an eye on the errors logged in the past 24 hours. The most frequent error normally requires investigation. Datadog provides a Patterns tool that helps here:
I typically try to fix the most frequent error each day. Here one of the feeds had been broken by a change on the other end.
Given the level of logging that we use I can’t remember the last time that I needed a debugger. Unit tests and logs solve this far quicker.
When I moved house I rented a van to move my possessions. I own a car but it would not have been practical to own a removal van. I don’t need a van all the time (technically I don’t need a car all the time, but do use it enough to make owning it worthwhile).
This is the model that makes sense for Serverless. For most users it would be cheaper to just rent the service when it is needed. A key point of Serverless is that you pay when you need it and don’t pay when you don’t. This can make the staging and development environment significantly cheaper without extra effort. I have worked on cloud hosted systems that were switched off overnight (and at weekends). This gave a cost saving, but if the start process failed we could be half a day without a working test environment.
Now there are cases where if you need to use a service all the time then other options become viable. You can run a server for $1 per day on Heroku.
Would Rent Infrastructure be a better name than Serverless? This could avoid the “you still have servers” debate.
Recently I have found how quickly you can stand up useful services. My team was asked to set up an sftp server. Using AWS and S3 we now have a working system 2 days after first being asked for it.
Whilst preparing my book Development I constructed a small toolchain to assemble the ePub, mobi and pdf files.
I have chosen the “Bring Your Own Book” option on leanpub to give me the maximum flexibility.
From the project files from Development I have extracted a github project that can act as a starting point for writing another book: Writers Toolkit.
Currently the scripts to setup and build are mac centric but I would welcome pull requests for other platforms.
The build tools are based upon the wonderful Pandoc. I use this to turn markdown files into ePub and pdf files. The ePub is then converted into a mobi file for Kindle.
The only issue that I have had with Pandoc is trying to convince it to correctly form P2 paragraphs. I had been using the inline ## form for this. The other option adding —- to the following line seems to be more reliable.