It is important to understand a few things in building lineage.
- The default lineage for a dataset is the scope of the database to which it belongs. The object specified in the dataset query will be searched within the database and then added to the lineage.
- If the fully qualified name of a data object that is part of another database is referred to, then it will be resolved and included in a lineage. This will be a case of data lineage across different databases.
- If the fully qualified name of the external data object is not used in the dataset query, and the object cannot be found in the parent database by its name, then temporary lineage objects are created within the system to represent them and to maintain a smooth lineage.
The options available to users to avoid the creation of temporary lineage objects are:
- Configure a default list of databases that will be searched for the external objects. The sequence of the search will be first within the parent database and then within the list of databases mentioned in the configuration property, in the order specified.
- Manually correct the relevant dataset query to add the fully qualified name of the data object and regenerate the lineage for the dataset.
The types of temporary lineage objects created include temp lineage schemas, temp lineage tables, temp lineage columns, temp lineage files, temp lineage file columns.
Lineage between data objects can be built in multiple ways:
- Auto Lineage
- Manual Lineage
- Lineage Maintenance Advanced Job
See this article Lineage Building for more details on scheduling and validating the lineage.
Lineage Versioning
Lineage versioning is a new feature in OvalEdge, to help users understand how the lineage of the data objects changes over time as the data system changes to support new business or technical requirements. This is supported for object and column levels for HQL Files & RDBMS.
Whenever a dataset is modified in the source system, a new lineage version for it is built in the OvalEdge platform, which will be shown first by default.
To activate lineage versioning
- Go to Administration Configuration > Lineage > key > ‘add.dataset.lineage.version’ set the value as ‘true’. This will display a new option, ‘Version History’, in the menu option in the Data Catalog > Queries page.
- Click on the Version History option which displays the list of lineage versions available for the dataset. When you select a version it will set the lineage version as the selected version, and all the tabs displaying the information will be updated accordingly.
- Additionally, if the data asset if saved to MyWatchlist you get notified if any changes in lineages are associated with that object.