Connectivity Summary
An out-of-the-box connector is available for Airflow to support crawling datasets i.e Airflow Dags and tasks and lineage building.
The connectivity to Airflow is via JDBC, which is included in the platform.
Connector Capabilities
The following are the connector capabilities mentioned below:
Crawling
Supported Objects: Jobs
Remarks: It fetches all Airflow Dags and Tasks from AirflowDb.
Please see this article Crawling Data for more details on crawling.
Lineage Building
Operation | Details |
Table t Table | Supported |
Table-File Lineage | Supported |
File - Table Lineage | Supported |
Column lineage- File Column Lineage | Supported |
Querying
Operation | Details |
Select | Supported |
Insert | Not Supported, by default. |
Update | Not Supported, by default. |
Delete | Not Supported, by default. |
Joins within database | Supported |
Joins outside database | Supported |
Aggregations | Supported |
Group By | Supported |
Pre-requisites
To use the connector, the following need to be available:
-
Connection details as specified in the following section should be available.
-
Service account, for crawling. The minimum privileges required are:
Operation: Access Permission
Connection validate: Should have permission for the specified path
Connection Details
The following are the connection settings that should be added for connecting to an Airflow:
- Connection (Database) Type: AirflowDB
- License Type: Standard or Auto Lineage
- Connection Name: Select a Connection name for the Airflow. The name that you specify is a reference name to easily identify the Airflow connection in OvalEdge.
Example: Airflow1 - Server: IP Address of Airflow
- Remote Dag Path: Enter the path of the location where all dags(Python files) located in the Airflow server
Local Dag Path: Enter the path of the location where all the dags(Python files) are present in Local/Ovaledge server. Here both must have the same count - Username: Provide the valid username
- Password: Provide a valid password
Points to be notedOnce connectivity is established, additional configurations for Crawling and Profiling can be specified.
-
-
Airflow requires its DAG Path in Remote (Airflow Server), Local (OvalEdge) server, IP address, Username, and Password. All the fields are mandatory. The connection will be successful only if connection details are correct and a valid Local DAG Path.CC
-
All the DAG’s must be copied from Remote Server (Airflow) ( QA1/QA2 localpath /home/ovaledge/dags ) to Local Server (OvalEdge) (dags https://sqldll.s3.us-east-2.amazonaws.com/dags.zip ).
-
Airflow DAG’s are considered as Datasets. Tasks of each DAG are considered as a child dataset.
-
There will be a python code associated with each DAG which must be copied from Airflow to OvalEdge. We will read the python code and create a dataset, Reading python code can be done successfully only if the remote DAG is correct and the corresponding Local DAG file exists.
-
Airflow Web can be accessed using URL http://{host:port}/admin/airflow/login
-