Data Integration

According to Wikipedia:

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies need to merge their databases) and scientific (combining research results from different bio-informatics repositories, for example) domains.

What does it actually need to deliver Data Integration at the conceptual level? Assume where data eventually resides can be [[ Purposely-built Databases ]], there are a few fundamental components are needed:

  • We will need a mechanism of [[ Data Ingestion]] and [[Data Delivery ]] to enable the transit from source to destination
  • We will need [[ Data Processing ]] to transform or prepare the dataset for validation purpose, performance, cost and business logic implementation purpose.
  • We may need [[ Data Discovery]] to help identify possible source and destination for very large data landscape for [[Data Management and Governance ]] purposes.
  • We will need [[ Data Pipeline Management ]] to design a de-coupled, immutable, idempotent data processing processes to ensure it is operationally supportable at scale.

Data Integration is a core part of the data value chain for the businesses. It also enables [[ Visualization and Analytics]] and AI capabilities using [[AutoML ]].

Notes mentioning this note

Updated: