Architect your data lake on Google Cloud with Data Fusion and Composer

The Good, the Bad and the Ugly in Cybersecurity – Week 8
February 19, 2021
Palo Alto Networks Appoints Dr. Helene D. Gayle to Its Board of Directors
February 19, 2021
The Good, the Bad and the Ugly in Cybersecurity – Week 8
February 19, 2021
Palo Alto Networks Appoints Dr. Helene D. Gayle to Its Board of Directors
February 19, 2021

* Data Fusion out of box source connector for API sources (i.e., HTTP source plugin) supports basic authentication (id/password based) and OAUTH2 based authentication of source APIs.

RDBMS

No landing zone is used in this architecture for data from on-premise RDBMS systems. Data Fusion pipelines are used to directly read from source RDBMS using JDBC connectors available out of the box. This is considering there was no sensitive data in those sources that needs to be restricted from being ingested into the data lake.

Summary

To recap, GCP provides a comprehensive set of services for Data and Analytics and there are multiple service options available for each task. Deciding which service option is suitable for your unique scenario requires you to consider a few factors that will influence the choices you make.

In this article, I have provided some insight into the considerations you need to make to decide the right GCP service for your needs in order to design a data lake.

Also, I have described the GCP architecture for a data lake that ingests data from a variety of hybrid sources, with ETL developers being the key persona in mind for skill set availability.

What next?

In the next article in this series, I will describe in detail the solution design to ingest structured data into the data lake based on the architecture described in this article. Also, I will share the source code for this solution.

Learning Resources

If you are new to the tools used in the architecture described in this blog, I recommend the following links to learn more about them.

Data Fusion

Watch this 3 min video for a byte sized overview of Data Fusion or listen to a more detailed talk from Cloud Next. Then try your hand at Data Fusion by following this Code Lab to Ingest CSV data to BigQuery.

Composer

Watch this 4 min video for a byte sized overview of Composer or watch this detailed video from Cloud OnAir. Want to try your hand? Follow these Quickstart instructions.

BigQuery

Watch this quick 4 min video for an overview and access BigQuery with free access using the BigQuery sandbox (subject to sandbox limits).

Try your hand with Code Labs for BigQuery UI Navigation and Data Exploration and to load and query data with the bq command-line tool.

Have a play with BigQuery Public Datasets and query the Wikipedia dataset in BigQuery.

Stay tuned for part 2: “Framework for building a configuration driven Data Lake using Data Fusion and Composer”

Leave a Reply

Your email address will not be published. Required fields are marked *