Working with OpenTelemetry in Elixir

I have created two github projects to demonstrate using OpenTelemetry (admittedly badly).

The first project https://github.com/chriseyre2000/something_to_measure is a simple GenServer that has a method that can be used to generate OpenTelemetry Spans.

The second project https://github.com/chriseyre2000/span_eater is a simple GenServer that consumes spans on the default port that OpenTelemetry sends them on.

In production you would typically have a sidecar application to capture and rebroadcast the messages.

In OpenTelemetry terms a Span is a time interval during which a given process ran. These can be nested. The result is that an Observability tool could capture the spans and construct a visualisation of what was happening. Spans are more useful than raw log data as it has a controlled meaning which would need to be inferred.

span_eater currently just logs that it has received the message. I am planning to make it more sophisticated, and then build a LiveBook to host it in. Currently it is useful to remove the log messages that otherwise get generated:
`

[info]  client error exporting spans {:failed_connect,
 [{:to_address, {'localhost', 4318}}, {:inet, [:inet], :econnrefused}]}

These messages typically flood the logs of a locally run application that is instrumented to publish OpenTelemetry data

I have just worked out how to decode the Protobuf data sent over the wire. We can now listen to OpenTelemetry messages sent by our local machine.

Thoughts on Configuration and Supervisors

Back on a previous project I worked with Heroku.
Heroku has a great setup for applications. You deploy to a git repo and have a parallel set of configuration via a UI/API.
If either changes the application redeploys the application.

On a walk this morning I realised that you could recreate some of this behaviour inside your Elixir (or other BEAM based) application.

Supervision Tree

If you had a supervisor with an all for one restart policy, if the Config Watcher notices a change it can simply terminate itself and restart the Worker service that depends upon the configuration.

The Config worker is both the cache for the data and periodically checks for changes.
This requires you to keep the configuration watcher specific to the service that uses it to reduce the blast radius of changes.

This seems like a different pattern of use to the typical supervisor.

First Steps into K8s

I am now need to get a better understanding of Kubernetes.
For the last few years I have been working with Docker, sometimes deployed to ecs, and before that the very simple Heroku setup.

I have just quickly skimmed through Kubernetes: Up and Running.

So far: minikube is a small one-node Kubernetes setup that can be deployed to a developers machine.

kubectl is the command line tool used to start/stop/scale things.

k8s has its own dns server and a host of self-hosted services. I am also aware that docker is involved.
There is also the important concepts of pods, daemonSets and tags.

The next item to read up on is Helm This seems to be an improved tool for deploying things to K8s.

Schema based Development: Protobuf and Avro

Back in the year 2000 there was a concept around that could have changed development. WSDL schemas allowed a project to define the contract that a service provided. Admittedly most people reverse engineered this from their existing systems and WSDL generated huge SOAP monstrosities. Having a single source that described a contract for a system really helps with integration. Making the contract a defined versioned item ensures that accidental breakages are less frequent.

In other spaces there were similar solutions that got more useful traction. Google had the problem of a large number of systems that needed to interact. They also used a range of programming languages to do this. Their solution was Protobuf. This is a file format that defines the serialization format for messages. Each service would use Protobuf to define the contract and libraries would generate the code to serialize/deserialize from the wire-protocol. It has strict versioning rules:

  • Identifiers cannot be reused (that is the underlying numeric identifier)
  • You can add a field to an existing read contract without breaking anything.
  • A required field may become optional or repeating, but the reverse is not allowed.
  • If you need more changes than the above allows you increment the version of the contract.

Oddly this is also the rules enforced by google BigQuery. Either it is isomorphic to Protobuf or uses Protobuf under the hood!

There are several styles of protobuf clients, some of which are smart enough to allow the generation of deltas. This means that when a large message changes and the client already has the previous version you can only send the updates. This proved to be very useful.

A typical usage of Protobuf is for the client to conform to the contract supplied. However it is a great system boundary and it is possible to use this at the edge of Bounded Contexts to translate field names into local concepts.You can also chose to ignore data that you don’t want.

Avro is a similar tool, typically used in the Kafka ecosystem. Here it is common to keep the schema in a Schema registry. This makes the asymmetric approach more difficult.

Craft GraphQL APIs in Elixir with Absinthe Part One

I have finally started working through this book. The downside of using an older book is that the samples use what is now ancient versions of Phoenix. I could install the older tools and try and make them work. This may or may not work due to distributed code rot (reliance on a library that no longer exists). Instead I have started from scratch.

Here is the repo that I am working on: https://github.com/chriseyre2000/absinthe_demo

My setup is simple.

I am using docker for postgres:

docker run -d -e POSTGRES_PASSWORD=postgres -p 5432:5432 postgres:11

Don’t do this in production, but it’s a quick way to stand up a demo.

With this I created a simple phoenix application:

mix phx.new a1_new

I have added the following to mix.exs:

      {:absinthe, "~> 1.7.0"},
      {:absinthe_plug, "~> 1.5.8"},
      {:absinthe_phoenix, "~> 2.0.2"},
      {:absinthe_relay, "~> 1.5.2"},

I copied over the repo migration scripts

mix ecto.create

mix ecto.migrate

I have also copied the default Schema to the new project.

So far this is enough to get the lookup_type working, once I noticed that you need to query form MenuItem, not menu_item (a level 8 network error).

I also needed to copy over the Ecto Schema from the sample (and fix up the module name).
Note that null: false is no longer valid Ecto syntax (in fact never has been, it was just not raised as an error before).

The Typescript Trap

Typescript is a distinct improvement over JavaScript in that you can find a whole class of problems early. This can lead to the assumption that you need less testing, which is unwarranted.

Typescript is strictly a preprocessor. The types only exist at compile time. This can result in a system that is capable of getting into states that the compiler would imply are impossible.

I have recently been working with GraphQL and the Relay compiler. In one example I had a type with a field marked as not null. Due to some interesting decisions of the Relay caching (https://github.com/facebook/relay/issues/2237 performance is preferred over accuracy) it could set the non-null field to be null. This was due to some aggressive caching where a query in an unrelated component could replace the value returned. This causes non-local bugs, requiring extensive analysis and testing. One of the possible solutions to this is for the Relay compiler to remove all non-null contraints – something that may be accurate but will require excessive null checks!

Typescript is not magic, you do need to test the application and at an aggregated level.

Has Boris Johnson Actually Resigned Yet?

Yesterday Boris Johnson gave a speech which has been claimed to be his resignation. At no point did he state that he had resigned either as the leader of the Conservative Party or as Prime Minister.

He did state that a leadership election for the Conservative Party would shortly be underway and that he would hand over to the new leader in the Autumn. Given that we now know that resigning ministers get three months pay it seems as if Boris is going to drag out the process so that he will be paid until the end of the year.

There are a number of investigations into his activities that need to take place (largely to do with his alleged improper Russian interactions). He cannot remain in office while they are happening. In any other setting he would be placed on suspension or put on gardening leave.

Boris Johnsons premiership has demonstrated holes in the British Constitution. Ministers are expected to act honourably and resign if their honour is called into question. This seems to have fallen out of practice. Given the weight that is given to president this cannot be left to stand.