How to use Neo4j in the cloud

Neo4j is an amazingly powerful database. For the right use case it is incredibly fast.

This is how to get up and running:

Sign up to heroku, create an empty project and provision a graphedb database. For small enough databases this is free.

Look in the configuration tab for the connection string url. Keep this secret as it contains the username and password for write access to the graphdb. You can paste this into a browser to give access to the neo4j console.

Now you will want to put some data in. The fastest way is to import from a csv file.

Here is a sample upload script (currently untested):

WITH PERODIC COMMIT 1000 IMPORT CSV FROM http://somewebsite/data.csv AS data MERGE (Test {id: data [0]})

The data must be on a publicly accessible website. I would recommend using an amazon s3 bucket (or a dropbox folder) but use a uuid for the folder name. It only needs to be available for the duration of the import and given that s3 hosted websites don’t expose the directory list function it will be almost impossible to guess.

 

 

Neo4j cli

This appears that there is a very powerful Ne04j cli called cycli

https://github.com/nicolewhite/cycli

$ cycli --help
Usage: cycli [OPTIONS]

Options:
  -v, --version            Show cycli version and exit.
  -h, --host TEXT          The host address of Neo4j.
  -P, --port TEXT          The port number on which Neo4j is listening.
  -u, --username TEXT      Username for Neo4j authentication.
  -p, --password TEXT      Password for Neo4j authentication.
  -t, --timeout INTEGER    Set a global socket timeout for queries.
  -l, --logfile FILENAME   Log every query and its results to a file.
  -f, --filename FILENAME  Execute semicolon-separated Cypher queries from a
                           file.
  -s, --ssl                Use the HTTPS protocol.
  -r, --read-only          Do not allow any write queries.
  --help                   Show this message and exit.



This looks like a great way to get data imported into a remote GrapheneDB from 
a build tool.

 

Life with Opsgenie

Opsgenie seems to dominate my life for a week every other month.

OK, it’s a support rota.

However it’s a great way of getting teams to do things. Raise it as an alert and it can’t be ignored. It’s a simple http post to raise or close so automation is trivial.

Intgration with statuspage.io as a webhook very is powerful.

Bigquery

This is an unusual kind of a database.

To start with it’s very cheap to use – you generally only pay for the columns of data read. You can get a lot of data processing for a few pence.

It does have some interesting other characteristics:

You can only read and write data, you can’t update it or delete specific rows.

It’s not always very fast (it’s for analytics not transactional  data) and does have periodic outages.

It can contain nested repeated rows.

Tables are meant to be extended – you can add an optional  or repeated column but can’t change the type of an existing one.

Copying tables is effectively free.

You can replace the entire contents of a table with a select result (or create a new table from a select result).

This means that you need a different strategy for loading data into these tables. The trick is to find a means of making inserts idempotent without reading too much data.

This could involve writing to a staging table and using a copy insert to move distinct data back into the master.

You don’t have to use the rest of the google infrastructure to load into BQ – we use heroku scheduler tasks. This means we pay pennies to load data into a storage system that costs pennies to run. This can completely change the economics of software development. The most significant cost is now developer and management time – it can now be cheaper to delegate authority for data creation to the developer (only ask if it will cost more than $100 per month) than to hold a meeting to request approval.

This cloud economics also gets fun when deciding when to optimise a slow process. If you could dial up the size of the box that you use to add 7 pennies to the months bill then how many months would that process need to run in order for it to be worth spending 2 hours of developer time to save that cost?