Data lineage is the flow of data across different systems from source to destination. Additionally, it includes the transformations of the data experienced along the way, such as how and why the data is changed. A data lineage is a visual representation of how data flows from start to end, in order to uncover its whole lifecycle.
The supported data set types to build lineage are SQL, Views, Store procedures, Triggers, Functions, Packages, and Package bodies.
In the OvalEdge application, there are two ways to build lineage, i.e: Manual Lineage Building, and Build Auto Lineage.
Manual Lineage Building: It is the process of building lineage by adding the source and destination objects manually.
Build Auto lineage: In the OvalEdge application Build Auto Lineage option is used to build lineage automatically by parsing queries/ source code. While building the lineage from queries, in case of failure of a query parsing the tables and columns associated with it to build the lineage, those failed queries are viewed, manually corrected, and processed using Build Auto Lineage.
Build Auto Lineage can be performed in two ways within the OvalEdge application.
- Administration > Crawler > Nine Dots > Build Lineage (for Auto Lineage License type).
- Advanced Tools > Build Auto lineage> Select the connector to build lineage (for both Standard and Auto Lineage License types).
- Manual Lineage Building, Data set Process (Through Query parsing one by one, Auto Lineage)
Building Auto Lineage
When you navigate to Advanced tools > Build Auto Lineage, you will get a pop-up to select the connection for which you want to build the lineage. After selecting the connection, the build auto lineage will appear with all the schemas available within the connection and the related information such as query or source code (Reports/ ETLs).
The Build Auto Lineage displays information such as Schema Name, Query Id, Job Type, Query, Query Name, Title, Last Run On, Status, Error Message, Last Modified Date, and Created Date for the selected connection.
Fields |
Description |
Schema Name |
Displays the name of schema having queries within the selected Database. Example: HR is the schema name having query Id 1063, which is available in the selected Database Mysql-A. |
Query Id |
Displays the Query Ids available within the schema. |
Job Type |
Displays what type of query it is, e.g.: SQL, procedure, etc. |
Code |
Displays the Query. |
Code Name |
Displays the name of the query. |
Title |
Displays the title of the query. |
Last Lineage Build Date |
Displays the latest date on which the query was run to build the lineage. |
Status |
Displays the status of the lineage Example: Success lineage build, Success with temp lineage, and Success lineage does not exist. |
Correct Query |
The correct query column is enabled with a tick mark and an eye icon against each Query. You can click on the tick mark to correct and validate the query and on the eye icon to view the query and its results. |
Error Message |
Displays the error message in case the query gets failed. |
Last Modified Date |
Displays the date when any changes are done in the query. |
Created Date |
Displays the date when the query was created in the OvalEdge application. |
You can also view the below information on the upper grid of the Build Auto Lineage. |
|
Connection Name |
Displays the selected Connection Name. |
Queries Loaded |
Displays the total number of queries available in the selected Database. |
Queries Processed |
Displays the number of queries processed to build lineage. |
Queries Unprocessed |
Displays the number of unprocessed queries, i.e., the queries which are not processed to build lineage. |
Last Job Status |
Displays the status of the latest run query. |
Last Run Date |
Displays the latest date on which the query was run to build lineage. |
User Action
You can perform various actions on the Build Auto Lineage.
Change Connection
A Change Connection button is available on the Build Auto Lineage to change the selected connection type.
To change the Connection:
- Click on the Change Connection button, to display the Build Lineage for Connection pop-up window.
- On the Build Lineage for Connection pop-up window, Select the connection from the drop-down menu you want to view, which will generate the Server Name Automatically.
- Click Ok, to view the details of the selected connection on the Build Auto Lineage.
Example: If you want to change the connection name from Mysql-A to Oracle, you can click on the Change Connection button. On the Build Lineage for Connection pop-up window, select Oracle from the Select Connection drop-down menu. Click on the Ok button to view the queries available in the Oracle Connection.
Additional User Actions
The Build Auto Lineage allows you to build lineage for different scenarios using the Nine dots icon.
For RDBMS connection:
- Build Lineage for selected Queries: It allows to build lineage for one or more selected queries from the list.
- Build Lineage for new/changed queries: It allows to build the lineage for new/ changed queries. Whenever you make changes to a query within a remote database, this feature allows you to build lineage for the new or changed query after crawling the database.
- Build Lineage for all queries: It allows to process all the available queries to build lineage. It will rebuild the lineage again for the previous processed queries.
- Fetch Query Log and build lineage: It allows you to fetch the query for building lineage, which is performed manually in the remote connection.
- Export Source Code to File: It allows to export the ZIP file to the OvalEdge Labs for the failed lineage at the clients’ environment. Here instead of accessing or requesting access for the client machine now you can export the failed lineage and investigate in the OvalEdge Lab and once after investigation, the corrected lineage and manually built lineage are imported to the client’s environment in JSON format.
- Import Source Code from File: Using the ‘Import Source Code’, you can import a JSON file to fix the lineage query issue or to build lineage manually. A JSON file can be directly imported into the client environment. Once the JSON is imported, lineage source code status is displayed as Success manually built from OvalEdge Lab (For other errors) or Success lineage built from OvalEdge Lab (for errors related to query).
Note: For the Report/ ETL connections, instead of query, you will get the source code option to build the lineage.
Build lineage for selected queries
You can build the lineage for one or more queries selected from the list.
To build the lineage for selected queries,
- Navigate to Advanced Tools > Build Auto Lineage.
- Click on the check box to select a query/s and then click on the Nine dots.
- Click on Build Lineage for selected Queries, to display a successful Job submission message.
- After refreshing the page you can see the Last Job Status as INIT and the same status can also be viewed on the Jobs Module. Once the status of the Job is successful, the SUCCESS status will be updated on the Jobs as well as on the Build Auto Lineage.
Build lineage for new/changed queries
In case of any changes in the existing query or the addition of some new queries in the remote database, to build the lineage for the changed or new queries you can use this Nine dots feature.
To build lineage for new/changed queries,
- Navigate to Advanced Tools > Build Auto Lineage.
- Go to the Nine dots icon > Build lineage for new/changed queries, to view the confirmation pop-up.
- Click on Submit to build lineage, which will display a successful job submission message.
Build lineage for all queries
This feature allows you to process all the available queries including the queries for that lineage is already built. It will rebuild the lineage again.
To Build lineage for all queries,
- Navigate to Advanced Tools > Build Auto Lineage.
- Go to the Nine dots icon > click on Build lineage for all queries to view a successful job completion message.
- A job will be initialized to process all queries for building lineages.
Fetch Query Log and build lineage
It allows you to build lineage for SnowFlake, Oracle, SQL Server, Mysql, and MariaDB connections.
The major part of OvalEdge is to process the query log settings to build the data lineage.
Query Logs are stored in Files as well as System tables; the difference between them is that files log conveys the action in a normal sense, whereas table log explains in query language defining the parameters, statistics involved like memory consumption, etc., Ex: When a database is crawled to use, file store this info as starting up the database, whereas table stores memory address, command type, a session in which it took place, objects related to query, etc.,
We can configure different options under the settings in the administrator module like Including any specific Query, providing the lookback period on the query for a particular database, Exclude any users for the query, and applying it to a specific Schema.
The fetch Query Log feature allows you to fetch the query which is executed or processed in the backend and then build the lineage along with the queries available. The first step is to set the crawler for the ‘Look back period.’ Look back period feature allows the query box to save the queries for those specified number of days and allows users to fetch queries that have been processed in the backend for that number of days ago. For example, the lookback period can be set to 1 day, 2 days, or more. The maximum Look back period is seven days.
To set up a look back period in the Crawler > Query Log Setting,
- Navigate to Administration > Crawler.
- Select a connection from the Crawler.
- Click on the Nine dots.
- Click on Settings.
- Enter the lookback period and Click on the Save Changes to save the settings.
To build lineage by fetching query log,
- Navigate to Advanced Tools > Build Auto Lineage.
- Select a connection and then click on the Nine dots.
- Click on the Fetch Query Log and Build Lineage. It displays a pop-up message as the job is submitted successfully. A user may navigate to the Job module to see the status of the jobs submitted.
Note: Now, depending upon the setting of the lookback period, it will fetch the query which is executed in the backend and then build the lineage without doing any recrawling. Crawling will process all the query fetching from the backend and make it available to the front end and then build the lineage accordingly.
Export and Import functionality using OvalEdge Lab
The feature of exporting lineages in the form of ZIP or JSON to the OvalEdge Labs is to investigate the failed lineage in the clients’ environment. Instead of accessing or requesting access for the client machine, you can export the failed lineage and investigate in the OvalEdge Lab, and once after investigation, the corrected lineage and manually built lineage are imported to the client’s environment. Along with export functionality, the existing matching algorithm for the tables can be used to build the lineage.
Export Source Code to File
Using the export option provided in Build Lineage > Nine dots, you can export the failed source codes in zip format and add them to the ticket, which will be shared with OvalEdge Lab through HubSpot.
To Export source code to file
- Navigate to Advanced tools > Build Auto Lineage.
- Select the connection name from the drop-down.
- Click on the check box to select the query.
- Click on the Nine-dots > Export source code to file.
Note: To make the instance OvalEdge Lab in the configurations key is.ovaledge.lab in true.
To make it client environment keep is.ovaledge.lab in false.
- Go to Administration > Advanced Job.
- Select the “Process SQL Statement” advanced Job and click on the edit icon to enter the File Path.
File Path: we need to give the zip file path, including the zip file name.
Password: provide the username of the application. (The name of the user who has downloaded the Zip file will be the password) - Click on the Run Advanced Job button. A popup window appears and click on the Submit button. A pop-up window with a success message is displayed with Job Id.
Export JSON: Using the export option in Build Lineage Screen | Nine dots, the developer or tester can export JSON from OvalEdge Lab after fixing the lineage issue.
Configuration Settings: Users need to set up the key (is.ovaledge.lab = true) in the configuration module before exporting the source code to the OvalEdge.
Import Source Code from File
Using the ‘Import Source Code’, users can import a JSON file to fix the lineage query issue or to build lineage manually. A JSON file can be directly imported into the client environment. Once the JSON is imported, lineage source code status is displayed.
You can check if the lineage is correctly built or not. The query corrected in the OvalEdge lab is seen in the correct query screen.
The Build Auto Lineage option allows you to execute the failed queries for building the lineage.
To build the lineage on failed queries,
- Navigate to Advanced tools > Build Auto Lineage.
- Select the connection name from the drop-down.
Note: The connection name is the associated name of the database connection (from the crawler) for which queries are written and imported from either a file or server logs. - Server Name will be displayed, and click on OK to view all the available queries.
- Click on the filter icon available on the status column and select the failed query check box, to view all the failed queries.
- Click on the check box to select a failed query. Click on the tick mark to open the Correct Query window.
Note: The Correct query window is divided into two sections. The failed query is displayed on the left pane of the window and an empty pane on the right for the user to correct the query. - Click on the copy icon to copy the query from the left pane to the right pane.
Note: There Nine dots icon available on the Query sheet window will help you to replace and format query in case of wrong/ unstructured query. The replace button is used to replace a text or command or string with the code of a query. This replacement of any string will impact on a global level in the application. While building the lineage, it will check if any replacement string is available. If it is available, then it builds the lineage according to the new string (new syntax, text command, etc.). The rest icon is also available on the correct query page to remove the copied query from the right pane as well as to remove the result.
Note: Correct query window has a Locking feature to lock the scrolling of the page. When the Lock is open, both the windows can be scrolled simultaneously. When the lock is closed, one window can be scrolled at a time. - Click on validate button to validate the query.
- The inline Result window at the bottom of the screen shows the error message. Edit the query manually, and re-select Validate to view the result as “Manually Corrected” (if the query is successfully parsed). Then click on the Save and Build Lineage button to build the lineage, to display the success message as Query updated successfully. You can view the lineage on the Data Catalog > Data Object > Lineage tab.
Manual Lineage Building
Lineage Shutter is a feature of adding a manual lineage to a query through the Query correction. There could be a case when the user is aware of what could be the associations, relationships, and data movements by examining the query but is unable to validate it through the query correction screen. In such cases, you are provided with an option to build his lineage on the queries. If you are not aware of building the lineage, you can simply export the query to OvalEdge Labs, where the CSM team will examine the query, build lineage on it, and get back to them. Most of the time, users are not comfortable sharing the machine to just validate and build lineage, so this feature might help them to save time and maintain their confidentiality.
To build lineage by using Lineage Shutter
- Navigate to Advanced tools > Build Auto Lineage.
- Select the connection name from the drop-down, which will display the server Name, and click on OK.
- Click on the Correct query button of the query to display the query correction page for which you want to build the lineage.
- Click on the Build Lineage shutter icon, which will the Build Lineage pop-up, which enables the user to write his lineage, Relationships, and Associations.
Note: For the user at the client machine to build the lineage, you can click on the Build button so that the inputs will be converted as Manual Lineage. For the users at OvalEdge Labs, you can just close the shutter after filling associations, relationships, and lineage fields, no need to click on build. You can just validate the query and export the JSON to the Client.
Lineage Validation
Once the code in the failed queries has been updated to Build Lineage, you can validate the lineage built by following the below steps:
- Navigate to Advanced Tools > Build Lineage.
- Click on the Query Title for which lineage was updated through a corrected query.
- Click on the Lineage Button on the Query. You can see the updated lineage on the Source and target.
Lineage Versioning
Lineage versioning in OvalEdge helps you to understand how the lineage of the data objects changes over time as the data system changes to support new business or technical requirements. This is supported for all the RDBMS connectors. Whenever a dataset is modified in the source system, a new lineage version for it is built in the OvalEdge platform and by default, the latest version will be displayed.
Lineage versioning is supported at a connection level and can be enabled/ disabled by different configurations.
To activate lineage versioning:
- Go to Administration > Configuration > set the value as ‘true’ for the key ‘add. dataset.lineage.version’ and add the connection id for the key ‘versioning.for.lineage.connection’ to display the ‘Version History’ and the ‘updated connection details (You can add multiple connection ids separated by comma(,))’ respectively.
- For the configuration ‘versioning.for.lineage.connection’, enter the connection Ids followed by a comma for the values, and then save.
Viewing Lineage Versioning
- On the Data Catalog > query > click on the updated query to view the query details. You will see the version history table with all the versions.
- Clicking on the Version History option displays the list of lineage versions available for the dataset. Selecting a version will set the lineage version as the selected version, and all the tabs displaying the information will be updated accordingly. (To get the lineage for the latest version, you need to rebuild the lineage using the Build lineage for selected queries to view the association, references, and lineage.
Copyright © 2019, OvalEdge LLC, Peachtree Corners GA USA