Migrating Teradata and other data warehouses to BigQuery

Red Hat Drives Cloud-Native Flexibility, Enhances Operational Security with Latest Version of Red Hat Enterprise Linux 7
August 6, 2019
Red Hat Recognized as a Leader by Independent Research Firm in Infrastructure Automation Platforms Evaluation
August 7, 2019

Data Analytics


Traditional, on-premises data warehouses collect and store what is often an organization’s most valuable data–which helps drive growth and innovation. Organizations depend on this data to make informed and timely decisions that can shape the future of their business. But we know that traditional data warehouses can be expensive, hard to maintain, and unable to keep up with business needs.

As data rapidly increases in volume, velocity and variety, it’s especially hard to get business needs met. We know that businesses are turning to BigQuery, our highly scalable and serverless enterprise data warehouse, to perform fast, real-time analysis of their data.

When migrating your data warehouse, you’re moving what’s essentially the center of gravity of your entire data analytics and business intelligence environment. Many business applications depend on your data warehouse for reports, data feeds, and dashboards, and the users of these business applications expect to have minimal to no disruption during the migration. With all this in mind, we’ve created a new data warehouse migration guide to help walk you through data warehouse migrations with as little complexity and risk as possible. In the guide, you’ll find prescriptive, end-to-end guidance to securely migrate legacy data warehouses to BigQuery. Though the guide contains some sections specific for migrations from Teradata, you’ll be able to use the vast majority of the guide for any enterprise data warehouse migration.

Building the migration framework
A migration can be a complex and lengthy endeavor, but it can be made simpler with planning. As part of the migration guide, you’ll find our suggested structured framework for data warehouse migrations, based on Agile principles. The framework facilitates the application of project management best practices, helping to bring incremental and tangible business value while managing risk and minimizing disruptions.

The framework adheres to the phases shown in the following diagram, with more details below:

Migration Framework.png

1. Prepare and discover: In this initial phase, the focus is on preparation and discovery. It’s about affording yourself and your stakeholders an early opportunity to discover the use cases you’re planning for BigQuery, raise initial concerns, and, importantly, conduct an initial analysis around the expected benefits.

2. Assess and plan: The assess-and-plan phase is about taking the input from the prepare-and-discover phase, assessing that input, and then using it to plan for the migration. This phase can be broken down into the following tasks:

  1. Assess the current state

  2. Catalog and prioritize use cases

  3. Define measures of success

  4. Create a definition of “done”

  5. Design and propose a proof-of-concept (POC), short-term state, and ideal end state

  6. Create time and cost estimates

  7. Identify and engage a migration partner (if applicable)

Find more details here on these tasks.

3. Execute: After you’ve gathered information about your legacy data warehouse platform, and created a prioritized backlog of use cases, you can group the use cases into workloads and proceed with the migration in iterations.

An iteration can consist of a single use case, a few separate use cases, or a number of use cases pertaining to a single workload. Which option you choose for the iteration depends on the interconnectivity of the use cases, any shared dependencies, and the resources you have available to undertake the work. For example, a use case might have the following relationships and dependencies:

  • Purchase reporting can stand alone and is useful for understanding monies spent and requesting discounts.

  • Sales reporting can stand alone and is useful for planning marketing campaigns.

  • Profit and loss reporting, however, is dependent on both purchases and sales, and is useful for determining the company’s value.

With each use case, you’ll want to decide whether it will be offloaded or fully migrated. Offloading focuses on time to delivery, where speed is the top priority, and fully migrating is about ensuring all upstream dependencies are also migrated. The following diagram shows the execution process and flow in greater detail:

overview - iteration.png

During the execute phase, the work to fully migrate or offload the use case or workload should focus on one or more of the following steps. Our guide includes documents dedicated to each of these steps:

  1. Setup and data governance: Setup is the foundational work that’s required in order to let the use cases run on Google Cloud Platform (GCP). Setup can include configuration of your GCP projects, network, virtual private cloud (VPC), and data governance. Data governance is a principled approach to manage data during its lifecycle, from acquisition to use to disposal. Take a look at the data governance document to help define your governance program in the cloud, which should include an outline of the policies, procedures, responsibilities, and controls surrounding your data activities.
  2. Migrate schema and data: The schema and data transfer document provides extensive information on how you can move your data to BigQuery and offers recommendations for updating your schema to take full advantage of BigQuery’s features. The associated quickstart guides you step by step through an actual schema and data migration from Teradata to BigQuery.
  3. Translate queries: The query translation document addresses some of the challenges that you might encounter while migrating SQL queries from Teradata to BigQuery, and explains when SQL translation is required. The associated quickstart simplifies this and takes you through an exercise to translate some queries using the Teradata SQL to the standard ISO:2011 SQL supported by BigQuery, starting with manual translation and evolving into a more automated approach. The SQL translation reference details the similarities and differences in SQL syntax between Teradata and BigQuery.
  4. Migrate business applications: Depending on your organization, your business applications might include dashboards, reports and operational pipelines. The reporting and analysis document explains how you can take advantage of the full suite of business intelligence tools and applications integrated with BigQuery–this includes the reporting and analysis applications that you may be using with your legacy data warehouse.
  5. Migrate data pipelines: The data pipelines document helps you understand what a data pipeline is, what procedures and patterns it can employ, and which migration options and technologies are available in relation to the larger data warehouse migration.
  6. Optimize performance: The performance optimization document helps you understand the factors that can impact performance in BigQuery and helps you apply essential techniques to improve it.
  7. Verify and validate: At the end of each iteration, validate that the use case was successfully migrated according to your definition of done, and verify that data governance concerns have been met, the schema and data have been migrated, and that business applications are producing the expected results.

Understanding the migration architecture
After each iteration in the execution phase, you’ll likely have some use cases offloaded to BigQuery, some fully migrated, and some still in your on-premises data warehouse. This iterative approach is enabled by an architecture where both your data warehouse and BigQuery can be actively used in parallel. This architecture allows you to take data warehouse migration one step at a time, breaking down its complexity and reducing risk.

The next diagram illustrates the architecture, showing Teradata working on-premises and BigQuery on GCP, where both can ingest from the source systems, integrate with your business applications, and provide access to the users who need it. Importantly, you can also see in the diagram that data is synchronized from Teradata to BigQuery.

Ref architecture -  during the migration.png

The data warehouse migration guide provides a wealth of prescriptive guidance so you can structure your migration project carefully and undertake each one of its challenges in a systematic manner. Our professional services organization and our partners are ready to assist you further in your migration journey, no matter how complex it may be. And check out our migration offer for help creating a streamlined path to a modern data warehouse.

Leave a Reply

Your email address will not be published. Required fields are marked *