Contentful-To-Neo4j version 1.1.0

I have finally updated the library after the feedback that I received at the joint user group. The code now handles all of the four sample Contentful spaces and empty spaces.

The library used the content type id as the name of the type in Neo4j. There is a mismatch in rules so I have prefixed it with “type_”. This avoids leading numbers.

Contentful to Neo4j in Elixir

I just finished an initial port of my neo4j to contentful library from Node to Elixir.

https://github.com/chriseyre2000/contentful_to_neo4j_ex

This is the same utility but in a very different language. It’s not quite as polished as the Node version (but won’t take long to catch up).

Issues that I have had during the port:

  • My machine had a very old erlang implementation (I had installed this when reading the Erlang chapter of 7 languages in 7 weeks – over 7 years ago) which broke HTTPoison, a fairly common HTTP library. The errors pointed to HTTPoison not working on a windows machine (which I have found to not be true).
  • The Contentful Elixir bindings are not very advanced. They don’t return the total number of items that you are paging through. It was not difficult to use the api directly.
  • There are lots of Elixir bindings for Neo4j. Very few of them are clearly documented for writing to Neo4j. Eventually I landed on bolt_sips. Bolt is too primitive, neo4j_sips uses a very old version of HTTPoison.

The Elixir error messages are incredibly clear.

Credo is great for ensuring best practices are applied.

Functional programming allows you to test real code without mocks or spend time fighting promises.

iex is a great REPL environment. You can recompile a module and carry on without restarting everything.

The code is self documenting. This is the top level method:

read_all_assets()
|> read_all_entries
|> process_contentful
|> write_to_neo4j
VSCode is a great editor for elixir. It’s great from the command line
code .
The above will open vscode on the project in the current directory.

Sample neo4j queries for the contentful-to-neo4j graph

I have now stabilised the contentful-to-neo4j project. It should be able to handle most contentful spaces. I still need to tune the transactions for very large datasets – but have no errors yet.

It has gone from being a simple script to an application that has 100% code coverage. I have learnt a lot about javascript Promises and how to test in Jest.

Here are some useful neo4j queries:

Find out how many of each content type that you have published:

MATCH (n) WHERE n.contenttype IS NOT NULL

RETURN n.contenttype, COUNT(n)

ORDER BY n.contenttype

You need to remember that nodes are circular and need to be surrounded by ()

Find pages that share slugs:

MATCH (a {slug:’common-url’}) RETURN a

Find nodes that have a slug with a specific ending:

MATCH (n) WHERE n.slug ENDS WITH ‘-kr’ RETURN DISTINCT n

Find assets that are not referenced by a content type:

MATCH (image {cmstype: ‘Asset’})
WHERE NOT (image)-[]-()
RETURN image

Simple search for orphans:

MATCH (a) WHERE NOT (a)-[]-() RETURN a

These are simple queries in Neo4j that would be hard to do in the Contentful UI.

Analysing Contentful Spaces in Graphene hosted in Heroku

I have just updated my contentful-to-neo4j project so that it will by default work on a graphene database hosted in Heroku.

Steps to get this working:

Clone the repo:

git clone https://github.com/chriseyre2000/contentful-to-neo4j

Sign up to heroku, add a credit card, create an empty heroku app.

Add the free graphene db addon (assuming that you have less than 1000 content types).

Install the heroku cli

Open a terminal in the directory that you checked out the repo.

This may require you to login to heroku on the cli.

heroku git:remote -a YOUR_APP_NAME

set the two keys that identify the contentful space:

SPACE_ID

CONTENTFUL_ACCESS_TOKEN

Once the app has finished restarting:

heroku run loader -a YOUR_APP_NAME

You should now have a full populate graphene database.

You may need to move to a paid tier big enough for your contentful space…

This is surprisingly painless. I only had to add environment variable fallbacks and a Procfile.

Viewing Contentful Data in Neo4j

Introduction

This article introduces Contentful ( a cloud hosted, headless, Content Management System (CMS)) and the graph database Neo4j. I have written a utility that allows data stored in Contentful to be imported into a Neo4j graph database. I’ll leave detailed explanations of what this can do until after I have explained the two systems. There are probably very few people who are familiar with both of these products so I will start with an introduction to each.

Contentful

Contentful is a cloud hosted headless CMS.

Contentful does provide a user-interface for the content editors:

ContentfulEditor

This is how the editors enter the various fields that make up a content type.

However Contentful does not provide a user interface for the “application”. Instead it provides various api’s that allow the developer to use the data in the CMS however they like. This is very powerful in that you are not limited by the user interface provided by a traditional CMS – you get to use it however you like. Given that the data is served by a rest api (with GraphQL coming soon) then you are not restricted by language.

Data in contentful is partitioned into Spaces. This is the equivalent of a distinct database.

Spaces contain three kinds of things:

  • Content Types – These are the schema of the content (it’s a list of fields, but fields may reference other content types or be lists of other content types )
  • Entries – Instances of content types. For example the above picture shows a Category content entry.
  • Assets – External images. Contentful acts a a document library and image resizing service for these.

Entries and Assets can be in preview or published states. There are two main api feeds preview and publish. Publish shows only published items, where as preview will also include more recent items that are in preview mode.

The contentful space that I am using for my examples is “The example project” which you get by default when you  sign up to contentful. It is the content for a website that explains how to use Contentful from a number of programming languages.

Here are the content types within this space:

ContentfulContentTypes

Creating a user interface for these is beyond the scope of this article but you can find plenty of examples.

Neo4j

Neo4j is a graph database.

Unlike traditional relational databases it works with nodes and relationships.

Nodes are the entities of the system and may have a label and attributes.

Nodes may be connected by relationships (which can be directional). Relationships can also have attributes.

You can query a neo4j database using a query language called Cypher.

Neo4j also has an api that allows other systems to connect to it – these can be used to query and modify the graph.

SimpleQuery

Here I have written one of the simplest queries possible in neo4j:

MATCH (a:category) RETURN a

This is find me all nodes of label category and return them. Here I have shown the details of one of the nodes in the browser.

SecondQuery

This is a more complex query:

MATCH(a:category)-[]-(b) RETURN a, b

This reads find me all nodes that are categories that have any kind of relationship to another node and return these nodes.

Note the visual nature of the query language. Nodes are contained in normal brackets but are displayed as circles in the browser!

There is far more that can be done with Neo4j and Cypher but this is a good introduction.

How to go from Contentful to Neo4j

Now I have introduced the two platforms I will explain how to move data from Contentful to Neo4j.

There are some warnings I should give you before you start to use it.

  • This will delete the neo4j database that it is pointed at before loading the data from the Contentful Space. Do not run this on a graph that you don’t have backed up.
  • If the utility can’t migrate a field it will skip it and log it to the output. Let me know if this happens and I will try to correct the utility.
  • I currently write the entire database in one transaction. This may not work well for very large contentful spaces. 

I typically use Neo4j for analytical databases that get dropped and recreated frequently. This may vary from other peoples usage. This means that I don’t need to worry about database backups or migrations. I do need to worry about loading the database quickly.

My utility is available in the following github repo:

https://github.com/chriseyre2000/contentful-to-neo4j

The utility is written in javascript and uses node to run the command. There are detailed instructions in the readme. I chose Javascript for this as this seems to be the most commonly used language for Contentful.

SelectAll.PNG

Here is what you get when you ask for all data (note in a realistically sized contentful system neo4j would cap the displayed nodes a 100).

This shows the relationships between the various nodes in the system.

Here is the schema of the graph database:

Schema

This shows the relationships between the various content types.

This is fun to play with and with more detailed queries can be informative.

However there are more powerful queries such as:

Orphans

This returns the asset nodes that are not linked to by any entry. These are orphan images that may have been published by mistake. This is something that the contentful UI cannot do.

 

You can also view the data in text form:

OrphanText

 

Summary

Hopefully this has given you a useful insight into what Contentful and Neo4j can be used for. Feel free to use my utility (at your own risk). It does make managing a Contentful system considerably easier – given that you can freely query the data.

Viewing a Contentful space in Neo4j – Part 1

Contentful is a really effective headless CMS. It’s api does have some limits (you can only query user defined fields across a single type at a time).

A few years ago I managed to find a way of mapping a contentful space into a Neo4j graph database. This allows full querying of the data in contentful. This can be useful for finding where a given image is in use or finding pages that are orphaned.

I am now trying to recreate this library as an open source node project.

I have started by creating a free contentful account (provided I only have one space and live within the limits this will be fine for my purposes).

To query contentful I am using the contentful npm package.

So far I can query the assets and content types in my space.

Initially I am going to use a local version of neo4j but will be moving to a Heroku hosted version. Neo4j community 1.1.6

I’ll add to this series as I progress.

How to use Neo4j in the cloud

Neo4j is an amazingly powerful database. For the right use case it is incredibly fast.

This is how to get up and running:

Sign up to heroku, create an empty project and provision a graphedb database. For small enough databases this is free.

Look in the configuration tab for the connection string url. Keep this secret as it contains the username and password for write access to the graphdb. You can paste this into a browser to give access to the neo4j console.

Now you will want to put some data in. The fastest way is to import from a csv file.

Here is a sample upload script (currently untested):

WITH PERODIC COMMIT 1000 IMPORT CSV FROM http://somewebsite/data.csv AS data MERGE (Test {id: data [0]})

The data must be on a publicly accessible website. I would recommend using an amazon s3 bucket (or a dropbox folder) but use a uuid for the folder name. It only needs to be available for the duration of the import and given that s3 hosted websites don’t expose the directory list function it will be almost impossible to guess.

 

 

Neo4j cli

This appears that there is a very powerful Ne04j cli called cycli

https://github.com/nicolewhite/cycli

$ cycli --help
Usage: cycli [OPTIONS]

Options:
  -v, --version            Show cycli version and exit.
  -h, --host TEXT          The host address of Neo4j.
  -P, --port TEXT          The port number on which Neo4j is listening.
  -u, --username TEXT      Username for Neo4j authentication.
  -p, --password TEXT      Password for Neo4j authentication.
  -t, --timeout INTEGER    Set a global socket timeout for queries.
  -l, --logfile FILENAME   Log every query and its results to a file.
  -f, --filename FILENAME  Execute semicolon-separated Cypher queries from a
                           file.
  -s, --ssl                Use the HTTPS protocol.
  -r, --read-only          Do not allow any write queries.
  --help                   Show this message and exit.



This looks like a great way to get data imported into a remote GrapheneDB from 
a build tool.