Azure Data Factory

Connectivity Summary

An out-of-the-box connector is available for Azure Data Factory. It provides support for crawling datasets i.e., Dataflows, Pipelines, Activities, Linked services, Datasets, and lineage building.

1-Nov-13-2022-03-43-08-2030-PM

Crawling:  Crawling is a process of collecting information about data from various data sources like on-premise and cloud databases, Hadoop, visualization software, and file systems. When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. 

Data Sources: OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. In this document, you can see how to make a connection to your Azure Data Factory instance and crawl the Dataflows, Pipelines, Activities, Linked services, Datasetsand tiles built from various Workspaces.

Connect to the Data: Before you can crawl and build a data catalog, you must first connect to your data. OvalEdge requires users to configure a separate connection for each data source type. The users must enter the source credentials for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.    

Prerequisites

The following are prerequisites for connecting to the Azure Data Factory 

To create a service in the Azure Portal

  1. Navigate to portal.azure.com
  2. The First step is to create an Instance in Azure Portal
  3. Click on Create a resource button.
  4. Enter the instance name, select the resource group, click on review, and create
  5. Click on the instance and properties
    1-Nov-29-2022-12-43-21-7781-PM
    * Note: In the instance properties details, you will get Subscription ID, Tenant ID, and Resource Group Name.

  6. Client ID and Client secret are gathered from App Registration
  • In Azure Active Directory, select App registrations from the left-pane

2-Nov-29-2022-12-45-20-8373-PM

  • Select New Registration

3-Nov-29-2022-12-47-04-5686-PM

In the Register, an application enters the meaningful application name to display to users.
4-Nov-29-2022-12-48-30-6663-PM
  • Once you have registered an application, click on the View API Permissions
5-Nov-29-2022-12-50-06-6880-PM
  • Select Add a permission
6-Nov-29-2022-12-55-54-0508-PM
  •  Select Microsoft Graph

7-Nov-29-2022-12-56-47-7671-PM

Select Delegated permissions.
8-Nov-29-2022-02-58-54-9469-PM
  • Select Yes to confirm your choice.

9-4

  •  Click on Certificates and secrets from the left-pane
10-4
  •  Select the New client secret button.

11-Nov-29-2022-03-03-37-1406-PM

  •  Provide a description for the client secret, and the duration for which the client secret will be valid, and click Add.
    12-3
  • Copy the string under column value. You won't be able to retrieve it after you perform another operation or leave this page.
    13-4
  •  Click Overview on the left pane and copy the client ID.
    14-4
  • Provide contributor role for app registry created for Data Factory. 
     

Connectivity via API

The connectivity to Azure Data Factory is via APIs, which are included in the platform. 

Version

The connector currently supports the following versions of Azure Data Factory:

Edition

Version

Support

Version

2018-06-01

Supported

User Permission

An admin/service account for crawling and building lineage. The minimum privileges required are:

Operation

Access Permission

Connection validate

Read

Crawl datasets

Read

Technical Specification

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks

Crawling

ADF_Schema

It is a Static schema created by Ovaledge to show the datasets

ADF_FACTORY

The factory is the main dataset that contains Dataflows and Pipelines

ADF_DATAFLOW

Which contains the lineage

ADF_PIPELINE

Contains Activities 

ADF_ACTIVITY

Only Copy contains the lineage 

ADF_DATASET

Which contains the Files and tables information

ADF_LINKED_SERVICE

Which contains the Connection information of Dataset

Lineage Building

Lineage entities

Details

ADF_DATAFLOW Table lineage/ File Lineage

Supported

ADF_ACTIVITY (Copy activity) Table lineage/ File Lineage

Supported

ADF_ACTIVITY (Copy activity) Column lineage/ File Column Lineage

Supported

Lineage Sources

ADF_DATAFLOW, ADF_ACTIVITY (Copy activity)

Querying 

Operation 

Details

Select

Supported

Insert

Not supported, by default.

Update

Not supported, by default.

Delete

Not supported, by default.

Joins within database

Supported

Joins outside database

Not supported

Aggregations

Supported

Group By

Supported

Order By

Supported

By default, the service account provided for the connector will be used for any query operations. If the service account has to write privileges, then Insert / Update / Delete queries can be executed.

Connection Details

To connect to the Azure Data Factory using the OvalEdge application, complete the following steps.

  1. Log in to the OvalEdge application
  2. Navigate to Administration > Crawler module.
  3. Click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed. 
  4. Select the connection type as Azure Data Factory. The Manage Connection with Azure Data Factory specific details pop-up window is displayed.
    1-Nov-13-2022-03-38-28-8518-PM

Field Name

Mandatory/Optional

Description

Connection Name

Mandatory

Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Azure Data Factory database connection in the OvalEdge application.

License Type

Mandatory

You can choose the License Type as Standard/ Auto Lineage.

Client Id

Mandatory

After registering your application, you'll see the application ID (or client ID) under

  1. Login to your Azure account.
  2. Select the azure active directory in the left sidebar.
  3. Click Enterprise applications.
  4. Click All applications.
  5. Select the application which you have created.

Click Properties.

  1. Copy the Application ID.

Client Secret

Mandatory

The application needs a client secret to prove its identity when requesting a token. For security reasons, Microsoft limits the creation of client secrets longer than 24 months and strongly recommends that you set this to a value less than 12 months.

  1. Login to your azure account.
  2. Select the azure active directory in the left sidebar.
  3. Click App registrations.
  4. Select the application which you have created.
  5. Click on All settings.
  6. Click on Keys.
  7. Type the Key description and select the Duration.
  8. Click Save.
  9. Copy and store the key value. You won't be able to retrieve it after you leave this page.

Tenant Id

Mandatory

The tenant ID identifies the Azure AD tenant to use for authentication. It is also referred to as the directory ID

  1. Login to your azure account.
  2. Select the azure active directory in the left sidebar.
  3. Click properties.
  4. Copy the directory ID.

Subscriber Id

Mandatory

The subscription ID is a GUID that uniquely identifies your subscription to use Azure services.

  1. Login to your azure account.
  2. Select Subscriptions in the left sidebar.
  3. Select whichever subscription is needed.
  4. Click on overview.
  5. Copy the Subscription ID.

Resource Group Name

Mandatory

Select the resource group based on the need

API Version

Mandatory

Currently supporting 2018-06-01

How to Validate the Lineage

Data Flows: 

Source and Sinks which contains Dataset reference Name

  • Source reference Name contains Source information of Table/File
  • Destination reference Name contains Target information of Table/File
  • Transformation and Script contains the columns at present not supporting 

   Pipelines: (Only Copy pipeline supports the lineage)

  Outputs and inputs which contain reference name

  • Outputs reference name contains Source information of Table/File
  • Inputs reference name contains Target information of Table/File
  • Type Properties contains the Column level lineage information

Reading reference Name from ADF_DATASET type. Properties contains location means it is a file or if it contains table name and schema it is table 

About @pipeline() and @dataset() values:

  • @pipeline(): Means that value available in pipeline parameter
  • @dataset(): This means that value available in dataset parameters 

   Script: 

Script is nothing but SQL query that contains the correct source and destination details which will build the lineage.

FAQs

  1. What should I know when upgrading my version?
         Currently, ADF supports only one version, i.e, 2018-06-01