Northern Powerhouse

Brexit is going to happen in the next few years. Since the Safe Harbour provisions failed businesses have been keeping cloud data in the eu region.

Currently the big cloud providers (Amazon, Microsoft, Google) don’t have public cloud data centres in the uk. The closest are Germany and Ireland.

Would it be a good idea for the British government to approach Amazon, Microsoft and Google to build cloud data centres in the Northern Powerhouse area? Once these are on the UK we could get local and national government it onto these platforms. This could be the incentive for the cloud providers to build their platforms here. With care this could reduce the infrastructure costs of government.

 

How to use Neo4j in the cloud

Neo4j is an amazingly powerful database. For the right use case it is incredibly fast.

This is how to get up and running:

Sign up to heroku, create an empty project and provision a graphedb database. For small enough databases this is free.

Look in the configuration tab for the connection string url. Keep this secret as it contains the username and password for write access to the graphdb. You can paste this into a browser to give access to the neo4j console.

Now you will want to put some data in. The fastest way is to import from a csv file.

Here is a sample upload script (currently untested):

WITH PERODIC COMMIT 1000 IMPORT CSV FROM http://somewebsite/data.csv AS data MERGE (Test {id: data [0]})

The data must be on a publicly accessible website. I would recommend using an amazon s3 bucket (or a dropbox folder) but use a uuid for the folder name. It only needs to be available for the duration of the import and given that s3 hosted websites don’t expose the directory list function it will be almost impossible to guess.

 

 

Neo4j cli

This appears that there is a very powerful Ne04j cli called cycli

https://github.com/nicolewhite/cycli

$ cycli --help
Usage: cycli [OPTIONS]

Options:
  -v, --version            Show cycli version and exit.
  -h, --host TEXT          The host address of Neo4j.
  -P, --port TEXT          The port number on which Neo4j is listening.
  -u, --username TEXT      Username for Neo4j authentication.
  -p, --password TEXT      Password for Neo4j authentication.
  -t, --timeout INTEGER    Set a global socket timeout for queries.
  -l, --logfile FILENAME   Log every query and its results to a file.
  -f, --filename FILENAME  Execute semicolon-separated Cypher queries from a
                           file.
  -s, --ssl                Use the HTTPS protocol.
  -r, --read-only          Do not allow any write queries.
  --help                   Show this message and exit.



This looks like a great way to get data imported into a remote GrapheneDB from 
a build tool.

 

Life with Opsgenie

Opsgenie seems to dominate my life for a week every other month.

OK, it’s a support rota.

However it’s a great way of getting teams to do things. Raise it as an alert and it can’t be ignored. It’s a simple http post to raise or close so automation is trivial.

Intgration with statuspage.io as a webhook very is powerful.

Bigquery

This is an unusual kind of a database.

To start with it’s very cheap to use – you generally only pay for the columns of data read. You can get a lot of data processing for a few pence.

It does have some interesting other characteristics:

You can only read and write data, you can’t update it or delete specific rows.

It’s not always very fast (it’s for analytics not transactional  data) and does have periodic outages.

It can contain nested repeated rows.

Tables are meant to be extended – you can add an optional  or repeated column but can’t change the type of an existing one.

Copying tables is effectively free.

You can replace the entire contents of a table with a select result (or create a new table from a select result).

This means that you need a different strategy for loading data into these tables. The trick is to find a means of making inserts idempotent without reading too much data.

This could involve writing to a staging table and using a copy insert to move distinct data back into the master.

You don’t have to use the rest of the google infrastructure to load into BQ – we use heroku scheduler tasks. This means we pay pennies to load data into a storage system that costs pennies to run. This can completely change the economics of software development. The most significant cost is now developer and management time – it can now be cheaper to delegate authority for data creation to the developer (only ask if it will cost more than $100 per month) than to hold a meeting to request approval.

This cloud economics also gets fun when deciding when to optimise a slow process. If you could dial up the size of the box that you use to add 7 pennies to the months bill then how many months would that process need to run in order for it to be worth spending 2 hours of developer time to save that cost?

A Life In The Cloud

I have spent the last four years working with cloud infrastructure. My employer has no critical servers in house.
We use a lot of cloud services. They vary in their characteristics. What we want from them:
  • A pricing model that scales with our success and allows for spikey traffic.
  • A stable interface. If you want to change something make it backwards compatable or give us notice of change.
  • Have a status page similar to status.io that is accurately updated.
  • Respond to support requests in hours not days.
  • Use webhooks rather than emails to announce outages, maintenance windows and upgrades.
  • We need to use an api to deploy or configure your service.
  • We need an api to reconcile any billing transactions with payments.
  • We will have multiple environments (qa, perf, staging, live) and only expect full price and support on live and perf.
Currently we deal with around a dozen suppliers and none of them provide all of the above.

Why don’t cloud service providers like Accountants?

Recently I have been working on a data warehouse project that is trying to extract billing and payment information from a range of cloud services. In all cases it seems that the data needed by the accountants has not been given the care that it needs.

The industry practice seems to be log into a portal and download a spreadsheet of data that may be accurate or timely. Any api’s exposing this data seem to only work on a per order basis – which means that an attempt to keep a database upto date could require geometrically increasing processing times (you need to check each order for changes to a transaction).  These api’s tend to be heavily restricted on the volume of data per call (hundreds of transactions per call) combine this with a frequently undocumented rate limit make catching up historical data difficult at best.

If you are providing services that accountants need think about providing an api that makes reconciliation easy. Reconciliation is important especially if you want to get your tax calculations right.

Dev-ops or No-ops

The team that I am working on has been described by our outgoing contractors as a DevOps team. We develop, deploy and support the site. We add monitoring to everything we can (sometimes after it goes wrong once). We are an agile team loosly working in a scrum setup. The trick we have is to always dedicate one pair (or half a pair if the team is odd numbered that week) to the operations channel. The team member that is on the support rota gets first option of working on the operations tickets this week.

Dev-Ops is normally about having a dedicated team that perform automation and monitoring of systems. We do have that happening but from within the team. There is no passing it to an “other” team. In that sense we have achieved No-Ops – there literally is no ops team that we can pass off problems too (OK we have various vendors to manage but that is a different matter).