Schema based Development: Protobuf and Avro

Back in the year 2000 there was a concept around that could have changed development. WSDL schemas allowed a project to define the contract that a service provided. Admittedly most people reverse engineered this from their existing systems and WSDL generated huge SOAP monstrosities. Having a single source that described a contract for a system really helps with integration. Making the contract a defined versioned item ensures that accidental breakages are less frequent.

In other spaces there were similar solutions that got more useful traction. Google had the problem of a large number of systems that needed to interact. They also used a range of programming languages to do this. Their solution was Protobuf. This is a file format that defines the serialization format for messages. Each service would use Protobuf to define the contract and libraries would generate the code to serialize/deserialize from the wire-protocol. It has strict versioning rules:

  • Identifiers cannot be reused (that is the underlying numeric identifier)
  • You can add a field to an existing read contract without breaking anything.
  • A required field may become optional or repeating, but the reverse is not allowed.
  • If you need more changes than the above allows you increment the version of the contract.

Oddly this is also the rules enforced by google BigQuery. Either it is isomorphic to Protobuf or uses Protobuf under the hood!

There are several styles of protobuf clients, some of which are smart enough to allow the generation of deltas. This means that when a large message changes and the client already has the previous version you can only send the updates. This proved to be very useful.

A typical usage of Protobuf is for the client to conform to the contract supplied. However it is a great system boundary and it is possible to use this at the edge of Bounded Contexts to translate field names into local concepts.You can also chose to ignore data that you don’t want.

Avro is a similar tool, typically used in the Kafka ecosystem. Here it is common to keep the schema in a Schema registry. This makes the asymmetric approach more difficult.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s