Lineage

Build Lineage

It is important to understand a few things in building lineage.

  • The default lineage for a dataset is the scope of the database to which it belongs. The object specified in the dataset query will be searched within the database and then added to the lineage.
  • If the fully qualified name of a data object that is part of another database is referred used, then it will be resolved and included in a lineage. This will be a case of data lineage across different databases.
  • If the fully qualified name of the external data object is not used in the dataset query, and the object cannot be found in the parent database by its name, then temporary lineage objects are created within the system to represent them and to maintain a smooth lineage.

The options available to users to avoid the creation of temporary lineage objects are:

  • Configure a default list of databases which will be searched for the external objects. The sequence of search will be first within the parent database and then within the list of databases mentioned in the configuration property, in the order specified.
  • Manually correct the relevant dataset query to add the fully qualified name of the data object and regenerate the lineage for the dataset.
The types of temporary lineage objects created include temp lineage schemas, temp lineage tables, temp lineage columns, temp lineage files, temp lineage file columns.

Lineage between data objects can be built in multiple ways:

  1. Auto Lineage: In this mode, lineage is built by the system by parsing all the available elements in the data catalog. The primary objects required for data lineage are datasets. These are parsed by the system to understand and build lineage.

    - The setting for auto-lineage is done in Crawler settings and the option availability is dependent on the license being used.
    - Select the database in Crawler view, click on 9 dots and select ‘Build Lineage’.

                                 

  2. Manual Lineage: In cases where auto-lineage is not supported due to license constraints or where datasets involved in data transformation and movements are not accessible, users can manually establish lineage for objects, by clicking on ‘Edit’ button in Lineage tab.
  3. Lineage Maintenance Advanced Job.