How To Effectively Report a bug

If you want to report a bug to a (typically an open source) project. Follow these steps:

Check if the bug is already in the tracker. If so see if you could add a new test case to demonstrste the issue.

If it is not in the tracker create a simple test case that demonstrates the problem. Being able to recreate the issue makes it far easier to work on, making it far more likely to be picked to work on.

If you know how to fix the problem check that they accept Pull Requests and supply one. Allow the maintainer to edit it. Its likely that you are going to have missed some internal rules of the project.

Contributions to documentation are really appreciated by maintainers.

Small Improvements

I have been following Codurance Fireside Chats (This is Episode 38).

Public Interest Note: I currently work for Codurance.

In this episode Sandro mentions that there is very little impact that you can have in 20% of the time.
I disagree, you can achieve a big impact with small changes over an extended period.

A good example came when I was supporting a production client system (in addition to other new development work). At the start of each day I would check the most frequent error listed in the logs for the past 24 hours. I would either try and resolve that error or improve the logging so that it can be resolved in the future. This would take less than 15 minutes per day.

There is an interesting statistical effect that happens in any distribution of events. It is extremely unlikely that everything happens at the same frequency. Certain items will dominate the sample.. Items that have identical frequencies are likely to be coupled (such as a system reporting the same error in two places). This is known as the Power Law

By using the Power Law I was able to significantly reduce the volume of error messages. The top 5 formed 50% of the overall messages so in a week the error volume had halved. This made the error logs easier to read. This process repeated for a few months and eventually there were only a handful of errors left that were so infrequent as to not be worth fixing. The next target was the warnings.

This makes a great principle: try and fix or improve one small thing each day. Even if the rest of the day is fairly monotonous at least you have that one small win. By the end of a year you will have hundreds of small wins which should be significant (even if some of them had to be rolled back).

Neo4j apoc

apoc is a set of useful addons for Neo4j

I have been using brew to install neo4j on my machine.

brew install neo4j

This can be combined with

brew services.start neo4j
and

brew services stop neo4j

This typically installs a db on http://localhost:7474/browser

On mac via homebrew you need to look in /usr/local/Cellar/neo4j/4.4.5/libexec
To enable apoc copy apoc-4.4.0.3-core.jar from labs to plugins

You also need to change config/neo4j.conf

I have set the following value:

dbms.security.procedures.unrestricted=apoc.*

Restarting neo4j get this working.

Practical Elixir: Home Server

Recently I have been turning to Elixir to solve some personal projects. It is a great general purpose language.

Like a lot of people I am mostly working from home. This involves working over a VPN to access some of the essential services that the client uses. The VPN client infrequently disconnects which is typically only noticed when one of these services become inaccessible.

It would be useful to be notified that this has happened. I found another Elixir project (https://github.com/navinpeiris/ex_unit_notifier) that had solved the notification problem but specifically for ex_unit. The trick is to install a CLI notification tool and execute that with appropriate parameters.

Once I had this I could use a simple webclient (HTTPoison) to check that a typical page exists that is only available behind the VPN.

This was wrapped in a specific GenServer that reran the check every minute. Having an Application and a Supervisor allows this to be started using iex -S mix.

This solved the problem but did require an iex session to be running. The next step was to use a Release to create something that could be run in the background. This only needed a simple `mix release`.

At this point I now had a service that can run as a background task. Shutting it down was a little clumsy – I’d need to find the shutdown script or manually kill it. It would be far simpler if there was a web interface so that it can be stopped via this (or a generic curl command).

Adding a trivial web interface was a simple matter of adding cowboy_plug to the application and configuring some routes. This allows the application to be cleanly shutdown using System.stop/0. Wrapping this in a task allows the web request to return a result before the shutdown.

Aside cowboy_plug has recently gained a dependency upon :telemetry, which needs to be added to your extra_applications. This section allows your application to request the optionally started Erlang application be run. Remember in Erlang terminology an application is more like a library rather than the whole project, which is called a release.

Now that I had a project that could do something in the background and notify the user the obvious thing was to make the GenServer more generic so that it can run any job on a schedule. This also allows me to keep the specific endpoints I am hitting outside a public repo.

This is the route that led to writing home_server. It now checks and notifies that the machine can connect to the internet (another intermittent issue) and has a config file that if present will monitor that URL and provide a custom message.

https://github.com/chriseyre2000/home_server

This project is still at an early stage. I plan to improve the config file as the current version is somewhat naive. It also needs a better solution for starting the processes from the config file.

Definition of Metastable Failure

Here is the paper that defined the Metastable Failure: https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s11-bronson.pdf

The definition is a system failure mode that will remain even if the trigger has been removed.

A simple example would be an unlimited linear retry strategy. A naive approach to retries is to repeat the call after a small pause. Under normal load this can help with an occasional problem. Under extreme load when the server is failing due to excessive requests the retry policy make things worse. Assuming that the overload is at 100 requests per second, a one second retry will generate 200 requests in second 2

This is what happens:

100 requests per second with an unbounded retry policy

A breaking load becomes persistent. This is why retry policies need to have exponential backoff and a means of giving up.

This is a very simple example, more complex systems can have many ways of getting into these problems.