Thoughts on scope of bugfixes

When working on a bugfix do you handle all of the edge cases?

Typically if a system is down you need to do what it takes to get the system back up as quickly as possible without making things worse. This can involve leaving rare edge cases as failure messages with logs that will help fix the problem in future.

This all depends upon the severity of a failure, the speed of deployment and the capacity to fix the problem in the near future.

Sometimes having a manual work around for a rare non-time critical issue is better use of time than overengineering a solution for a problem that may never happen.

Recently I have had to work on production issues that cannot be recreated in a test envionment (without waiting a day or so to set up the test data).

A related type of problem is the bug that could have multiple causes. You think you have recreated the issue fix it only to find it is still broken in production. Having audit logs at info level can help estsblish exactly what was attempted. Event sourced systems can recreate what suceeded, but not always the things that fail.

When logging a failure message always give enough information to locate the error without leaking PII.

Leave a comment