Lineage

Introduction to OvalEdge Lineage

Summary

Data lineage is the provenance of data in an enterprise as it moves across various systems. The lineage is defined for a reference object, with the following lineage components.

  • Upstream data lineage: representing the source data systems long with the intermediate hops as it undergoes transformation and reaches the reference object.
  • Downstream data lineage:  representing the intermediate hops of transformation before reaching the downstream target or destination data system objects.

OvalEdge supports Data Lineage, referred hereafter as lineage, at the meta data level for the objects that are part of the data catalog. The objects involved in lineage are:

  • Source objects. This represents the object from which the data flows to downstream data objects.
  • Target object. This represents the object to which the data flows from upstream data objects.
  • These are query elements like stored procedures, functions, views, SQL queries which are involved in the transformation of data as it moves from source objects to target objects.
  • Entity Relationships. These represent the relationships between different objects in the datasets. The relationships are inferred from join statements that are part of the dataset.
  • Associated objects. These are objects that are part of the datasets. These can be grouped into:
    • Direct associated objects involved in the data movement and transformation are referred to in the dataset output.
    • Indirect associated objects, involved in the dataset but that are not referred to in the output of the dataset code.

Lineage is supported for the following data object types:

  • Tables
  • Table Columns
  • Files
  • File columns
  • Reports
  • Report Columns
  • Views, Stored Procedures, Functions, Triggers, Package body, json, xml, yaml, .csv, .config etc.

See this article View Data Lineage to know more about Lineage.