* Data Fusion out of box source connector for API sources (i.e., HTTP source plugin) supports basic authentication (id/password based) and OAUTH2 based authentication of source APIs.
No landing zone is used in this architecture for data from on-premise RDBMS systems. Data Fusion pipelines are used to directly read from source RDBMS using JDBC connectors available out of the box. This is considering there was no sensitive data in those sources that needs to be restricted from being ingested into the data lake.
To recap, GCP provides a comprehensive set of services for Data and Analytics and there are multiple service options available for each task. Deciding which service option is suitable for your unique scenario requires you to consider a few factors that will influence the choices you make.
In this article, I have provided some insight into the considerations you need to make to decide the right GCP service for your needs in order to design a data lake.
Also, I have described the GCP architecture for a data lake that ingests data from a variety of hybrid sources, with ETL developers being the key persona in mind for skill set availability.
In the next article in this series, I will describe in detail the solution design to ingest structured data into the data lake based on the architecture described in this article. Also, I will share the source code for this solution.
If you are new to the tools used in the architecture described in this blog, I recommend the following links to learn more about them.
Watch this 3 min video for a byte sized overview of Data Fusion or listen to a more detailed talk from Cloud Next. Then try your hand at Data Fusion by following this Code Lab to Ingest CSV data to BigQuery.
Stay tuned for part 2: “Framework for building a configuration driven Data Lake using Data Fusion and Composer”