What happened to the Cloud CI Providers?

2017 comes with the decommissioning of two cloud ci providers.

Snap-CI and Bamboo-CI are both shutting down.

Snap-ci provided a build pipeline (triggered by a github webhook) with log and artefact viewing. This was all configured using a clean UI which allowed environment variables to be stored as either plaintext or securely.

Finding a replacement is not simple. The options seem to include options without a ui to specify the pipelines or are much more expensive.

The roll your own optiona are complex possibly including docker hosted travis.

 

Lotus Improv equivalent in a Browser

This open source pivot table (http://nicolas.kruchten.com/pivottable/examples/local.html) brings the power of Lotus Improv to the web.

Lotus Improv was a groundbreaking spreadsheet that was based upon pivot tables rather than a simple grid. Lotus killed it as it was cannibalising it’s own Lotus 1-2-3 sales.

The pivot table above allows you to load a csv file (say the output of a Bigquery query) and then perform ad-hoc pivot analysis on it.  By dragging the column headings around you get to see your data summarised in your browser.  The sample could easily be extended to run a report from say a S3 bucket. This means that we could email links to pivot tables around…

How to use Neo4j in the cloud

Neo4j is an amazingly powerful database. For the right use case it is incredibly fast.

This is how to get up and running:

Sign up to heroku, create an empty project and provision a graphedb database. For small enough databases this is free.

Look in the configuration tab for the connection string url. Keep this secret as it contains the username and password for write access to the graphdb. You can paste this into a browser to give access to the neo4j console.

Now you will want to put some data in. The fastest way is to import from a csv file.

Here is a sample upload script (currently untested):

WITH PERODIC COMMIT 1000 IMPORT CSV FROM http://somewebsite/data.csv AS data MERGE (Test {id: data [0]})

The data must be on a publicly accessible website. I would recommend using an amazon s3 bucket (or a dropbox folder) but use a uuid for the folder name. It only needs to be available for the duration of the import and given that s3 hosted websites don’t expose the directory list function it will be almost impossible to guess.

 

 

Neo4j cli

This appears that there is a very powerful Ne04j cli called cycli

https://github.com/nicolewhite/cycli

$ cycli --help
Usage: cycli [OPTIONS]

Options:
  -v, --version            Show cycli version and exit.
  -h, --host TEXT          The host address of Neo4j.
  -P, --port TEXT          The port number on which Neo4j is listening.
  -u, --username TEXT      Username for Neo4j authentication.
  -p, --password TEXT      Password for Neo4j authentication.
  -t, --timeout INTEGER    Set a global socket timeout for queries.
  -l, --logfile FILENAME   Log every query and its results to a file.
  -f, --filename FILENAME  Execute semicolon-separated Cypher queries from a
                           file.
  -s, --ssl                Use the HTTPS protocol.
  -r, --read-only          Do not allow any write queries.
  --help                   Show this message and exit.



This looks like a great way to get data imported into a remote GrapheneDB from 
a build tool.

 

Life with Opsgenie

Opsgenie seems to dominate my life for a week every other month.

OK, it’s a support rota.

However it’s a great way of getting teams to do things. Raise it as an alert and it can’t be ignored. It’s a simple http post to raise or close so automation is trivial.

Intgration with statuspage.io as a webhook very is powerful.

Bigquery

This is an unusual kind of a database.

To start with it’s very cheap to use – you generally only pay for the columns of data read. You can get a lot of data processing for a few pence.

It does have some interesting other characteristics:

You can only read and write data, you can’t update it or delete specific rows.

It’s not always very fast (it’s for analytics not transactional  data) and does have periodic outages.

It can contain nested repeated rows.

Tables are meant to be extended – you can add an optional  or repeated column but can’t change the type of an existing one.

Copying tables is effectively free.

You can replace the entire contents of a table with a select result (or create a new table from a select result).

This means that you need a different strategy for loading data into these tables. The trick is to find a means of making inserts idempotent without reading too much data.

This could involve writing to a staging table and using a copy insert to move distinct data back into the master.

You don’t have to use the rest of the google infrastructure to load into BQ – we use heroku scheduler tasks. This means we pay pennies to load data into a storage system that costs pennies to run. This can completely change the economics of software development. The most significant cost is now developer and management time – it can now be cheaper to delegate authority for data creation to the developer (only ask if it will cost more than $100 per month) than to hold a meeting to request approval.

This cloud economics also gets fun when deciding when to optimise a slow process. If you could dial up the size of the box that you use to add 7 pennies to the months bill then how many months would that process need to run in order for it to be worth spending 2 hours of developer time to save that cost?

A Life In The Cloud

I have spent the last four years working with cloud infrastructure. My employer has no critical servers in house.
We use a lot of cloud services. They vary in their characteristics. What we want from them:
  • A pricing model that scales with our success and allows for spikey traffic.
  • A stable interface. If you want to change something make it backwards compatable or give us notice of change.
  • Have a status page similar to status.io that is accurately updated.
  • Respond to support requests in hours not days.
  • Use webhooks rather than emails to announce outages, maintenance windows and upgrades.
  • We need to use an api to deploy or configure your service.
  • We need an api to reconcile any billing transactions with payments.
  • We will have multiple environments (qa, perf, staging, live) and only expect full price and support on live and perf.
Currently we deal with around a dozen suppliers and none of them provide all of the above.

Why don’t cloud service providers like Accountants?

Recently I have been working on a data warehouse project that is trying to extract billing and payment information from a range of cloud services. In all cases it seems that the data needed by the accountants has not been given the care that it needs.

The industry practice seems to be log into a portal and download a spreadsheet of data that may be accurate or timely. Any api’s exposing this data seem to only work on a per order basis – which means that an attempt to keep a database upto date could require geometrically increasing processing times (you need to check each order for changes to a transaction).  These api’s tend to be heavily restricted on the volume of data per call (hundreds of transactions per call) combine this with a frequently undocumented rate limit make catching up historical data difficult at best.

If you are providing services that accountants need think about providing an api that makes reconciliation easy. Reconciliation is important especially if you want to get your tax calculations right.