Viewing Contentful Data in Neo4j

Introduction

This article introduces Contentful ( a cloud hosted, headless, Content Management System (CMS)) and the graph database Neo4j. I have written a utility that allows data stored in Contentful to be imported into a Neo4j graph database. I’ll leave detailed explanations of what this can do until after I have explained the two systems. There are probably very few people who are familiar with both of these products so I will start with an introduction to each.

Contentful

Contentful is a cloud hosted headless CMS.

Contentful does provide a user-interface for the content editors:

ContentfulEditor

This is how the editors enter the various fields that make up a content type.

However Contentful does not provide a user interface for the “application”. Instead it provides various api’s that allow the developer to use the data in the CMS however they like. This is very powerful in that you are not limited by the user interface provided by a traditional CMS – you get to use it however you like. Given that the data is served by a rest api (with GraphQL coming soon) then you are not restricted by language.

Data in contentful is partitioned into Spaces. This is the equivalent of a distinct database.

Spaces contain three kinds of things:

  • Content Types – These are the schema of the content (it’s a list of fields, but fields may reference other content types or be lists of other content types )
  • Entries – Instances of content types. For example the above picture shows a Category content entry.
  • Assets – External images. Contentful acts a a document library and image resizing service for these.

Entries and Assets can be in preview or published states. There are two main api feeds preview and publish. Publish shows only published items, where as preview will also include more recent items that are in preview mode.

The contentful space that I am using for my examples is “The example project” which you get by default when you  sign up to contentful. It is the content for a website that explains how to use Contentful from a number of programming languages.

Here are the content types within this space:

ContentfulContentTypes

Creating a user interface for these is beyond the scope of this article but you can find plenty of examples.

Neo4j

Neo4j is a graph database.

Unlike traditional relational databases it works with nodes and relationships.

Nodes are the entities of the system and may have a label and attributes.

Nodes may be connected by relationships (which can be directional). Relationships can also have attributes.

You can query a neo4j database using a query language called Cypher.

Neo4j also has an api that allows other systems to connect to it – these can be used to query and modify the graph.

SimpleQuery

Here I have written one of the simplest queries possible in neo4j:

MATCH (a:category) RETURN a

This is find me all nodes of label category and return them. Here I have shown the details of one of the nodes in the browser.

SecondQuery

This is a more complex query:

MATCH(a:category)-[]-(b) RETURN a, b

This reads find me all nodes that are categories that have any kind of relationship to another node and return these nodes.

Note the visual nature of the query language. Nodes are contained in normal brackets but are displayed as circles in the browser!

There is far more that can be done with Neo4j and Cypher but this is a good introduction.

How to go from Contentful to Neo4j

Now I have introduced the two platforms I will explain how to move data from Contentful to Neo4j.

There are some warnings I should give you before you start to use it.

  • This will delete the neo4j database that it is pointed at before loading the data from the Contentful Space. Do not run this on a graph that you don’t have backed up.
  • If the utility can’t migrate a field it will skip it and log it to the output. Let me know if this happens and I will try to correct the utility.
  • I currently write the entire database in one transaction. This may not work well for very large contentful spaces. 

I typically use Neo4j for analytical databases that get dropped and recreated frequently. This may vary from other peoples usage. This means that I don’t need to worry about database backups or migrations. I do need to worry about loading the database quickly.

My utility is available in the following github repo:

https://github.com/chriseyre2000/contentful-to-neo4j

The utility is written in javascript and uses node to run the command. There are detailed instructions in the readme. I chose Javascript for this as this seems to be the most commonly used language for Contentful.

SelectAll.PNG

Here is what you get when you ask for all data (note in a realistically sized contentful system neo4j would cap the displayed nodes a 100).

This shows the relationships between the various nodes in the system.

Here is the schema of the graph database:

Schema

This shows the relationships between the various content types.

This is fun to play with and with more detailed queries can be informative.

However there are more powerful queries such as:

Orphans

This returns the asset nodes that are not linked to by any entry. These are orphan images that may have been published by mistake. This is something that the contentful UI cannot do.

 

You can also view the data in text form:

OrphanText

 

Summary

Hopefully this has given you a useful insight into what Contentful and Neo4j can be used for. Feel free to use my utility (at your own risk). It does make managing a Contentful system considerably easier – given that you can freely query the data.

Viewing a Contentful space in Neo4j part 2

I have now completed a minimal version of the mapping tool.

The code can be found in this github repo:

https://github.com/chriseyre2000/contentful-to-neo4j

There are some warnings before you use this script:

  • It will empty the target neo4j database before populating it.
  • All of the data is written to neo4j in a single transaction. This will be corrected soon.
  • I don’t yet handle all primitive field types.
  • I have not yet added links to assets embedded in markdown fields.
  • It has only been tested on a local machine with a local neo4j.

I can’t guarantee that it will work with all Contentful spaces yet. So far I have only tested it on the demo space that comes when you sign up for a free Contentful account.

If you do have problems please send me either a pull request or a failing test case (show me the schema of the problem content type).

Once you have the neo4j database populated it becomes trivial to find orphaned items. I’ll add some useful samples queries to the documentation.

API vs Library

In my previous post I mentioned that I am trying to write to Neo4j from node.

This is becoming difficult as each of the top two libraries seems to have serious dependency problems.

The Neo4j package does not handle error conditions well due to a missing stacktrace function.

The neo4j-driver package is also broken with a vague “Headers is not defined”.

This brings me to the main topic: when should you choose a library versus directly using an api?

Firstly does the library do the job? There are lot of node packages out there. Some of them are useful.

If the library adds some features then it’s a no-brainer. Especially if the library can maintain a stable contract despite the underlying api changing (The Delphi VCL was a great example of this. The VCL survived the underlying platform moving from 16 to 32 bit without any code changes required).

There is also the matter of the dependency chain that you will pick up with the library. It’s not uncommon for a npm package to have tens of dependencies. This can mean that a poorly maintained library can be broken by someone else’s change (this may be less of a problem with the introduction of package-lock). This can also be difficult with dependencies from multiple required libraries: these can clash and you may have to chose which bugs you can live with.

If the usage that you have of an api is simple and the api is stable it may be worth directly calling just the parts that you need.

Viewing a Contentful space in Neo4j – Part 1

Contentful is a really effective headless CMS. It’s api does have some limits (you can only query user defined fields across a single type at a time).

A few years ago I managed to find a way of mapping a contentful space into a Neo4j graph database. This allows full querying of the data in contentful. This can be useful for finding where a given image is in use or finding pages that are orphaned.

I am now trying to recreate this library as an open source node project.

I have started by creating a free contentful account (provided I only have one space and live within the limits this will be fine for my purposes).

To query contentful I am using the contentful npm package.

So far I can query the assets and content types in my space.

Initially I am going to use a local version of neo4j but will be moving to a Heroku hosted version. Neo4j community 1.1.6

I’ll add to this series as I progress.

Testing React with Enzyme, Jest and Fetch

I have been having fun recently working on a react app. Typically I use TDD.

This can be difficult when you have code that goes:

fetch.then(res => json)
.then(res => this.setState(res.aValue)
 )

Testing this gets awkward. Changing the signature is too invasive. You can easily mock the fetch but get a race condition reading the state.

This is the solution that I found works. By abusing the javascript sequencing a bit – do what you need to then check this:

setTimeout(() => { assert thing.state}, 0)

Thoughts On Contentful Migrations

Contententful have a wonderful headless CMS. It has a UI for the content editors but the developers get a stream of JSON. This means that the consuming application is free to do anything with the output – which is a major contrast to other CMS systems.

Contentful have a CLI tool that allows migrations to be applied to the schemas (and content). This is not the approach that my team has taken (largely because the migration tool did not exist when we needed it).

Our approach is to have a declarative content schema in code. On deployment it compares this to the environments current state and attempts to upgrade it.

Upgrades happen on a field level of each content type. Once a content type has been upgraded then the widget type for each field is set.

We don’t attempt to remove fields or migrate data as we have not in two years had the need to automate this (we have manually removed a few fields or redundant content types).

This approach is closer to desired state configuration than migrations. It has the benefit of acting as it’s own complete documentation.

 

Eight Influential Development Books

Here are eight influential books that relate to software development:

The classic:20180526_183055

Introducing techniques:

20180526_183109.jpg

The skills around development:

20180526_183115.jpg

How to unit test anything:

20180526_183120.jpg

How to stretch your knowledge:

20180526_183129

How to stretch your knowledge further:

20180526_183135.jpg

How to explain your new ideas to other people:

20180526_183254.jpg

Information about Quality, Gumption Traps and The Scientific Method:

20180526_183142.jpg

Redux CombineReducers is broken

Recently I have started using react redux. It’s amazing how complex React/Redux makes even the simplest task.

I needed to pass an environment value through to the client side rendered ui. This was to allow the component to vary by environment – something essential in a dev -> qa -> preprod -> live pipeline.

You need to pass this into the store and map it out into the state of your component. All of the documentation and examples seem to care about is state mutation. The trivial case is overlooked.

The app that I am working on has multiple reducers that are combined with CombineReducers. This has the interesting side effect of dropping any state that is not included in a reducer.

This is not accidental it actually warns you (cryptically) that it is doing it.

I solved this for now by creating an idempotentReducer that simply returns the state. This is not a good long term solution.

The documentation on CombineReducers does state that it is intended for simple cases. I can’t imagine why passing through static state is not the simplest case.

Phoenix for Rails Developers – Part 3

I have now completed the book (although not all of the bonus exercises).

https://github.com/chriseyre2000/storex

The remainder of the book demonstrated extensions to the application to create an administration mode. This shows how you extend an existing application.

Overall the book is a gentle introduction to Phoenix and avoids introducing some of the more complex features (there is no attempt to explain OTP or Channels).

The code struct has been easy to follow with the examples being split into the correct sized chunks. This is in comparison to Learn React Native who code samples were excessive given that you needed to type in several pages of code before anything could be checked.

I did find a few typos which the author says are due to be corrected soon (an early view of the contents of the database includes the admin flag, some brackets are missing on some of the redirects and the validate_max_price method breaks when the price is null).

This has been a very useful addition to my Elixir/Phoenix study library.

It is interesting to compare the changes in the Phoenix framework between the Learning Phoenix book and this one. The major difference is the movement of the model from within the Phoenix app itself (storex_web) into a library (storex).

The only variation I took from the book was to add Credo:

{:credo, “~> 0.9.0”}
This is a style and code recommendation tool that use gets suggested in a number of places.
This adds a number of styling warnings for any application. It can be customised but does encourage good practice.
I have fixed up some of the hints (spacing issues, don’t use cond with only one non-true condition).