This is an unusual kind of a database.
To start with it’s very cheap to use – you generally only pay for the columns of data read. You can get a lot of data processing for a few pence.
It does have some interesting other characteristics:
You can only read and write data, you can’t update it or delete specific rows.
It’s not always very fast (it’s for analytics not transactional data) and does have periodic outages.
It can contain nested repeated rows.
Tables are meant to be extended – you can add an optional or repeated column but can’t change the type of an existing one.
Copying tables is effectively free.
You can replace the entire contents of a table with a select result (or create a new table from a select result).
This means that you need a different strategy for loading data into these tables. The trick is to find a means of making inserts idempotent without reading too much data.
This could involve writing to a staging table and using a copy insert to move distinct data back into the master.
You don’t have to use the rest of the google infrastructure to load into BQ – we use heroku scheduler tasks. This means we pay pennies to load data into a storage system that costs pennies to run. This can completely change the economics of software development. The most significant cost is now developer and management time – it can now be cheaper to delegate authority for data creation to the developer (only ask if it will cost more than $100 per month) than to hold a meeting to request approval.
This cloud economics also gets fun when deciding when to optimise a slow process. If you could dial up the size of the box that you use to add 7 pennies to the months bill then how many months would that process need to run in order for it to be worth spending 2 hours of developer time to save that cost?