

There are several preferred file formats such are preferred by Data Lake.

#Data lake architecture drivers
Most of the scenarios ETL tools create connections to the relevant databases through connectors, ODBC or JDBC drivers to extract data from the EDW.These can be Standard RDBMS based EDW or cloud-based Data warehouse.There are possibilities that the data lake sources the data from an existing enterprise data warehouse or EDW to create consolidate data reference using other sources of data.Example: SAP ERP, Oracle Apps, Quick books.Data lake connects these applications through connectors, adapters, APIS or web services for ETL.These are mainly Databases or file-based data store applications that store transaction data.These are Transaction business applications like ERP, CRM, SCM or Accounts which are used to capture business transactions.Example: Sources from Flat files, NoSQL Databases, RDBMS, and Industry Standard Formats like HL7, SWIFT, EDI which as some predefined data formatsĭata lake architecture mostly use sources from the following:.It is tricky for ETL professionals to aggregate the sources to create consolidate data for processing.These are from different data formats and structures.Example: Sources from MS SQL Server databases.They are categorized into two types based upon the source structure and formats for ETL Process.The ETL or ELT mediums are being used to retrieve data from various sources for further data processing.Sources are the providers of the business data to the data lake.We will discuss the sources from a Data lake perspective. The following diagram represents a high-level Data lake architecture with standard terminologies.ĭata lake architecture is majorly comprised of three components or layers in general. Let us understand what comprises a data lake by discussing the data lake architecture. The data lake resembles the lake where the water comes in from various sources and stay in the native form, whereas package bottle of water resembles a data mart which undergoes several filtrations and purification process similarly the data is processed for a data mart. There is a very well-known analogy of data lake with a lake from Pentaho CTO James Dixon who coined the term Data Lake.
