Storybook as a software design tool

I recently found Storybook to be a great means of simplifying the design of a component.
Storybook is a component design tool. You build a catlog of examples showing how a component can be used.

This is great for ensuring system consistency as it is possible to lay out all possible options for a component. Given that it can be time consuming to achieve certain states by using a full application, having all of the possible states laid out in one place is a big time saver.

I am working with a react application that uses both Relay and Formik. I am just getting the hang of using Formik in storybook. Relay is more difficult.

In typical open source fashion a component exists because someone had a problem to solve. However when people move on to new projects (or the project is replaced) these eventually become abandoned. The relay storybook components are in this state. It worked once, but is now broken by the continuously moving environment.
This leaves two choices: try to fix the library or work around the limitation.

Currently I am using the work-around by extracting “pure” components and testing them.
For example I could put all of a dialog box into a component and isolate it from the effect button.

Why are we still messing with the clocks?

Last weekend was the start of British Summer Time, which is described as daylight saving.
This was implemented during WWII and is best charcterised as a politicians sigilism.

We must do something.

This is something.

We must do this.

The net effect of moving all the clocks an hour earlier is to inflict jet lag on an entire country for 2 weeks. During that time the sun will be rising earlier anyway so it will be as light when you wake up as if the system had not been imposed.

The argument that it helps farmers is demonstrably crazy. Pet owners know how confused they are at randomly changing meal times. Can you imagine how confused a milking cow herd would be!

Back to Dependabot

I have started looking at dependabot again.

With the loss of the Heroku free tiers the old solution I used no longer works.

The first problem to solve is to detect PRs in need of merging.


declare -a arr=(“name1” “name2” “name3” )

for i in “${arr[@]}”
do
gh pr list -R owner/$i
done

The above is a bash script which requires you to have the gh cli tool installed and configured to access your repos.

This will help give you a report of the pending PRs to merge. It may need adapting if you have too many.

The next step is to start merging them.

Dependabot text commands are useful here. You can use `@dependabot merge` to assist with this.

The step beyond that is detecting the number of merged PRs to deploy. You don’t want a huge deploy in case it needs to be reverted.

You will never be clear of the upgrade treadmill. The best solution is to fully automate it.

To use that you need several things:

– a fast reliable deploy/rollback process

– a sufficient test suite

The best option is to automate the merging of dependabot PRs that pass all the tests. Beware false positives that other integrations can give (snyk).

You will also need an automated deploy process. Deploying the latest build every day at a fixed time would help this (this also ensures that you could at least deploy yesterday).

It is possible to rate limit dependabot to only having 10 open PRs at a time. This could help but could be problematic if you are in a fast moving environment like javascript.

Mermaid Diagrams

I will be giving a talk at work about how to use mermaid diagrams.

Here is a set of examples that make great starting points.

https://mermaid.js.org/syntax/examples.html

I have a confession: I can’t draw well,

However I can create useful diagrams,

Important Formula:

Cost To Create + Cost to Maintain > Value of diagrams

Mermaid allows cheap creation and maintenance allowing the ability to achieve value in cases where other techniques will be prohibitive.

The perfect is the enemy of good enough.

No Diagram is perfect.

It’s easier to offer suggestions to improve a diagram than a large text document.

It’s amazing how much you learn building a diagram.

Diagrams need to have a key. (Although with some diagram types the key can be shared)

Each symbol you use needs to have the same meaning everywhere.

Map vs Diagram – a map is a special kind of diagram where space has meaning.

Documenting Software Architectures: Views and Beyond.

Build them by copying and editing.

The friends:

These can be a bit more complex. Typically they require a command line tool to transform a file into an image

Graphviz (https://graphviz.org/gallery/)
Plantuml (https://plantuml.com/)

For interactive diagrams there is the excellent
d3js. (https://d3js.org/)

First look at Rust

I am starting to study Rust as it is used at work on some projects. So far it has been ok, but not overwhelming.

Rust is a semicolon language, which must be a more important split than the static/dynamic or compiled/interpreted split.

This is coming as a shock after Groovy, Typescript and Elixir.

The Rust compiler messages seem less helpful than I’d expect.

When Log Analysis Goes Too Well

Once upon a time I was working on a system that was deployed on customers sites. That meant we had no regular access to the logs for the system which made getting feedback difficult.

One of the developers on the team added an exception logging table which would capture every exception raised by the system along with the full stack trace.

In order to debug a certain error the customer provided us with a full backup of their DB. In addition to fixing the specific issue I managed to get 2 days to look at the exceptions.

By looking at the frequencies I was able to put in place measures to handle 99% of these errors. Now most of them were transitory problems that the customer had never reported.

When the fixed version of the product was returned to the customer they were happy with it, but could not explain why.
In facts they were so happy that it took 2. years to convince them to take a new version (we had a 6 month release cycle at this time).

Read the Logs

One of my morning rituals at work is to look at a custom summary view of error logs. This allows me to learn what is happening in the system.

Combined with an active drive to clean up the most frequent errors this is a great way to learn what a normal day looks like. Anything new has been caused by a change somewhere or an external failure.

This also gives you a way to estimate how frequently errors are happening compared to the events users are attempting.

Typically I ask in one of our developer slack channels if anyone is aware of the new issue. Sometimes people are already aware of the problem. Frequently there will be further investigation required.

For context I work in a medium sized company with multiple monoliths surrounded by a suite of services. Errors in the logs within my teams scope can be caused by work belonging to my team or more frequently by changes made by other teams.

Note this is mostly about errors. I recommend logging errors when something has gone wrong. This is more important in parts of a system where customers are paying for something. We have two major splits in a system as there are 1000x quotes compared to sales. The quote system should log less frequently than the sales otherwise they will dominate.

Recently I found a new log message stating that manual intervention was required for a process. I had to ask the person who wrote that code exactly what the manual intervention was and who would do it.

I recommend adding links in the log message to a page in the wiki with the instructions to fix. This can start out as a placeholder.

I also find that info or warning messages can be a great way to prove that a change has worked. I recently added a scheduled retry in 24 hours for a weird refund scenario (you can’t partially refund a credit card transaction that is less than 24 hours old…). By logging the retries it was possible to see how frequently this problem happened and how many manual interventions were required (In this case not many and none).

Thoughts on scope of bugfixes

When working on a bugfix do you handle all of the edge cases?

Typically if a system is down you need to do what it takes to get the system back up as quickly as possible without making things worse. This can involve leaving rare edge cases as failure messages with logs that will help fix the problem in future.

This all depends upon the severity of a failure, the speed of deployment and the capacity to fix the problem in the near future.

Sometimes having a manual work around for a rare non-time critical issue is better use of time than overengineering a solution for a problem that may never happen.

Recently I have had to work on production issues that cannot be recreated in a test envionment (without waiting a day or so to set up the test data).

A related type of problem is the bug that could have multiple causes. You think you have recreated the issue fix it only to find it is still broken in production. Having audit logs at info level can help estsblish exactly what was attempted. Event sourced systems can recreate what suceeded, but not always the things that fail.

When logging a failure message always give enough information to locate the error without leaking PII.