Connectivity Summary
An out-of-the-box connector is available for Azure Data Factory. It provides support for crawling datasets i.e., Dataflows, Pipelines, Activities, Linked services, Datasets, and lineage building.
Crawling: Crawling is a process of collecting information about data from various data sources like on-premise and cloud databases, Hadoop, visualization software, and file systems. When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository.
Data Sources: OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. In this document, you can see how to make a connection to your Azure Data Factory instance and crawl the Dataflows, Pipelines, Activities, Linked services, Datasetsand tiles built from various Workspaces.
Connect to the Data: Before you can crawl and build a data catalog, you must first connect to your data. OvalEdge requires users to configure a separate connection for each data source type. The users must enter the source credentials for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.
Prerequisites
The following are prerequisites for connecting to the Azure Data Factory
To create a service in the Azure Portal
- Navigate to portal.azure.com
- The First step is to create an Instance in Azure Portal
- Click on Create a resource button.
- Enter the instance name, select the resource group, click on review, and create
- Click on the instance and properties
* Note: In the instance properties details, you will get Subscription ID, Tenant ID, and Resource Group Name. - Client ID and Client secret are gathered from App Registration
- In Azure Active Directory, select App registrations from the left-pane
- Select New Registration

- Once you have registered an application, click on the View API Permissions

- Select Add a permission

- Select Microsoft Graph

- Select Yes to confirm your choice.
- Click on Certificates and secrets from the left-pane

- Select the New client secret button.
- Provide a description for the client secret, and the duration for which the client secret will be valid, and click Add.
- Copy the string under column value. You won't be able to retrieve it after you perform another operation or leave this page.
- Click Overview on the left pane and copy the client ID.
- Provide contributor role for app registry created for Data Factory.
Connectivity via API
The connectivity to Azure Data Factory is via APIs, which are included in the platform.
Version
The connector currently supports the following versions of Azure Data Factory:
Edition |
Version |
Support |
---|---|---|
Version |
2018-06-01 |
Supported |
User Permission
An admin/service account for crawling and building lineage. The minimum privileges required are:
Operation |
Access Permission |
---|---|
Connection validate |
Read |
Crawl datasets |
Read |
Technical Specification
The connector capabilities are shown below:
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
ADF_Schema |
It is a Static schema created by Ovaledge to show the datasets |
ADF_FACTORY |
The factory is the main dataset that contains Dataflows and Pipelines |
|
ADF_DATAFLOW |
Which contains the lineage |
|
ADF_PIPELINE |
Contains Activities |
|
ADF_ACTIVITY |
Only Copy contains the lineage |
|
ADF_DATASET |
Which contains the Files and tables information |
|
ADF_LINKED_SERVICE |
Which contains the Connection information of Dataset |
Lineage Building
Lineage entities |
Details |
---|---|
ADF_DATAFLOW Table lineage/ File Lineage |
Supported |
ADF_ACTIVITY (Copy activity) Table lineage/ File Lineage |
Supported |
ADF_ACTIVITY (Copy activity) Column lineage/ File Column Lineage |
Supported |
Lineage Sources |
ADF_DATAFLOW, ADF_ACTIVITY (Copy activity) |
Querying
Operation |
Details |
---|---|
Select |
Supported |
Insert |
Not supported, by default. |
Update |
Not supported, by default. |
Delete |
Not supported, by default. |
Joins within database |
Supported |
Joins outside database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
By default, the service account provided for the connector will be used for any query operations. If the service account has to write privileges, then Insert / Update / Delete queries can be executed.
Connection Details
To connect to the Azure Data Factory using the OvalEdge application, complete the following steps.
- Log in to the OvalEdge application
- Navigate to Administration > Crawler module.
- Click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed.
- Select the connection type as Azure Data Factory. The Manage Connection with Azure Data Factory specific details pop-up window is displayed.
Field Name |
Mandatory/Optional |
Description |
---|---|---|
Connection Name |
Mandatory |
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Azure Data Factory database connection in the OvalEdge application. |
License Type |
Mandatory |
You can choose the License Type as Standard/ Auto Lineage. |
Client Id |
Mandatory |
After registering your application, you'll see the application ID (or client ID) under
Click Properties.
|
Client Secret |
Mandatory |
The application needs a client secret to prove its identity when requesting a token. For security reasons, Microsoft limits the creation of client secrets longer than 24 months and strongly recommends that you set this to a value less than 12 months.
|
Tenant Id |
Mandatory |
The tenant ID identifies the Azure AD tenant to use for authentication. It is also referred to as the directory ID
|
Subscriber Id |
Mandatory |
The subscription ID is a GUID that uniquely identifies your subscription to use Azure services.
|
Resource Group Name |
Mandatory |
Select the resource group based on the need |
API Version |
Mandatory |
Currently supporting 2018-06-01 |
How to Validate the Lineage
Data Flows:
Source and Sinks which contains Dataset reference Name
- Source reference Name contains Source information of Table/File
- Destination reference Name contains Target information of Table/File
- Transformation and Script contains the columns at present not supporting
Pipelines: (Only Copy pipeline supports the lineage)
Outputs and inputs which contain reference name
- Outputs reference name contains Source information of Table/File
- Inputs reference name contains Target information of Table/File
- Type Properties contains the Column level lineage information
Reading reference Name from ADF_DATASET type. Properties contains location means it is a file or if it contains table name and schema it is table
About @pipeline() and @dataset() values:
- @pipeline(): Means that value available in pipeline parameter
- @dataset(): This means that value available in dataset parameters
Script:
Script is nothing but SQL query that contains the correct source and destination details which will build the lineage.
FAQs
- What should I know when upgrading my version?
Currently, ADF supports only one version, i.e, 2018-06-01