Lineage between data objects can be built in multiple ways:
- Auto Lineage
- Manual Lineage
- Lineage Maintenance Advanced Job
Auto Lineage
In this mode, lineage is built by the system by parsing all the available elements in the data catalog. The primary objects required for data lineage are datasets. These are parsed by the system to understand and build lineage.
- The setting for auto-lineage is done in Crawler settings and the option availability is dependent on the license being used.
- Select the database in Crawler view, click on 9 dots and select ‘Build Lineage’.
Manual Lineage
In cases where,
- Auto-lineage is not supported due to license constraints
- Datasets involved in data transformation and movements are not accessible
Users can manually establish lineage for objects, by clicking on the ‘Edit’ button in the Lineage tab.
Lineage Maintenance Advanced Job
Follow the steps given below:
- Go to Advanced Tools 🡪 Lineage Maintenance 🡪 Select Object Type, Database, Schema & Table. Once the details are provided, we get filter to add Source & Target objects
- Once the source & target objects are provided, click on ‘View’ to see the lineage built.
Example: The following figure displays a sample destination lineage modification to a selected table EmailAddress.
Additional Functions For Adding Lineage
There are three additional functions that can be performed in adding lineage to source or destination.
- Select a lineage and click on nine dots and select the required options.
- Select Edit query for editing through query sheet and save.
- Click on Edit Transformation Notes for adding any information for editing the query.
Add/Edit Column Info
- Use this function to add and edit the column mapping for the selected object in source and destination.
- Select Mapping Column via AI for mapping the columns.
- Map column as required and save.
Validating Auto-Lineage results
Once auto-lineage settings have been configured in the Crawler settings, lineage building jobs will be run by the system. The results of these jobs can be viewed, validated, and managed using the ‘Build Auto Lineage’ advanced job.
- Select Advanced tools.
- Select the Build Auto Lineage tile.
- Select the connection name from the drop-down.
The connection name is the associated name of the database connection (from crawler page). The list of datasets for which lineage has been build is shown below:
Each dataset may have any one of the following statuses:
- SUCCESS_LINEAGE_BUILD: This status indicates that the lineage for the dataset has been built successfully.
- SUCCESS_LINEAGE_PARTIALLY_BUILD: This status indicates that one or more queries that are part of the dataset have failed.
- FAILED_EXCEPTION: This indicates that the lineage could not be built due to parsing errors. Users can correct the query of the dataset and retry lineage building.
- SUCCESS_LINEAGE_FIXED: This indicates that the lineage was built successfully after the manual correction of the dataset query by a user.
When building lineage from the dataset queries, few queries may fail to be parsed by the algorithm used. In such cases, users can correct the queries based on the debugging information provided and build the lineage for the dataset.
Building Lineage for Failed or Partially Successful Datasets
The queries of the failed or partially successful dataset have to be corrected by the user, validated to ensure that they are parse-able and processed for building lineage.
- Choose a failed dataset query and click view query icon to view the query
- The query window is displayed which has 3 panels. The query source is displayed in the left panel. The right panel is used for editing the query locally, without affecting the query in the source system.
- Copy the source query to the right panel by clicking on copy icon.
- The bottom panel shows the debug trace for the validation action. Edit the query to address the issues displayed and click on the Validate button. Repeat this process till the validation is successful.
- Click “Save And Lineage” to build the lineage. The lineage built for the datasets can be viewed in the respective tabs of the objects that are part of it.
- The status will change to SUCCESS_LIENAGE_CORRECTED.
Temp lineage is created when the Table / File is not found in the crawled database.