The Good, the Bad and the Ugly in Cybersecurity – Week 35
August 27, 2021Replicating from Cloud Spanner to BigQuery at scale
August 27, 2021As enterprises compete for market share, their need for real-time insights has given rise to increased demand for transactional databases to support data analytics. Whether it’s to provide dashboards that inform rapid decision-making, to perform analysis on a lookup table stored in a transactional database, or to conduct complex hybrid transactional analytical workloads, there is a growing demand for analyzing large volumes of data in transactional databases.
Cloud Spanner is Google Cloud’s fully managed relational database for transactional workloads, and today, with the general availability of Spanner federated queries with BigQuery, it gets even more powerful. BigQuery is Google Cloud’s market-leading serverless, highly scalable, multi-cloud data warehouse that makes analytics easy by bringing together data from multiple sources. With Spanner’s BigQuery federation, you can query data residing in Spanner in real time without moving or copying the data, bridging the gap between operational data and analytics and creating a unified data lifecycle.
It’s already been possible to use tools like Dataflow to copy data from Spanner over to BigQuery, but if you haven’t set up these ETL workflows — or you simply need to do a quick lookup on data that’s in Spanner — you can now take advantage of BigQuery’s query federation support to run real-time queries on data that’s stored in Spanner.
In this post, we’ll look at how to set up a federated query in BigQuery that accesses data stored in Spanner.
How to run a federated query
Suppose you’re an online retailer that uses Spanner to store your shopping transactions, and you have a Customer 360 application built in BigQuery. You can now use federated queries to include this customer’s shopping transactions in your Customer 360 application without needing to copy the data over to BigQuery from Spanner.
To run a Customer 360 query in BigQuery that includes the shopping transactions that are stored in Spanner, follow these steps:
-
Launch BigQuery and choose the Google Cloud project that contains the Spanner instance that includes the shopping transactions database.
-
Set up an external data source for the Spanner shopping database in BigQuery. You’ll need to have bigquery.admin permissions to set this up.
-
Write a query in BigQuery that accesses the shopping data in the Spanner data source. If you’d like other users to access this external data source in BigQuery, simply grant them permission to use the connection resource you just created.
Setting up an external data source
To setup a Spanner external data source, select “Add data” and choose “External Data Source”