ETLs

Azure Data Factory

Azure Data Factory (ADF) is an ETL tool which is provided by Microsoft Azure. It allows you to create, schedule, and manage data pipelines that can move data between supported on-premises and cloud-based data stores. Azure Data Factory enables you to compose data storage, movement, and processing services into automated data workflows.

OvalEdge uses the API to connect to the Azure Data Factory schema source, which allows the user to crawl (Workflows, Tasks, Mapping, Transformations, etc) and build Lineage.

ADF-1

Connector Capabilities

The following is the list of objects and data types supported by the Azure Data Factory connector.

Functionality

Supported Objects

Crawling 

  • Workflows
  • Tasks
  • Worklets
  • Mappings
  • Mapplets
  • Transformations

Prerequisites

The following are the prerequisites required for establishing a connection between the connector and the OvalEdge application. 

Connectivity via API

The connectivity to Azure Data Factory is via APIs, which are included in the platform. 

Version

The connector currently supports the following versions of Azure Data Factory:

API Version

Support

2018-06-01

Supported

Configuring Environment Variables

Configuring environment names enable you to select the appropriate environment from the drop-down list when adding a connector. This allows for consistent crawling of schemas across different environments, such as production (PROD), staging (STG), or temporary environments. It also facilitates schema comparisons and assists in application upgrades by providing a temporary environment that can be later deleted if needed.

Before establishing a connection, it is important to configure the environment names for the specific connector. If your environments have been configured, skip this step.

Steps to Configure the Environment

  1. Log into the OvalEdge application.
  2. Navigate to AdministrationSystem Settings.
  3. Select the Connector tab.
  4. Find the key name “connector.environment”.
  5. Enter the desired environment values (PROD, STG) in the Value column.
  6. Click ✔ to Save.

Create a service in the Azure Portal

  1. Navigate to portal.azure.com
  2. The First step is to create an Instance in Azure Portal
  3. Click on Create a resource button.
  4. Enter the instance name, select the resource group, click on review, and create
  5. Click on the instance and properties
    1-Nov-29-2022-12-43-21-7781-PM
    * Note: In the instance properties details, you will get Subscription ID, Tenant ID, and Resource Group Name.

  6. Client ID and Client secret are gathered from App Registration
    1. In Microsoft Entra ID (previously, Azure Active Directory), select App registrations from the left-pane2-Nov-29-2022-12-45-20-8373-PM
    2. Select New Registration3-Nov-29-2022-12-47-04-5686-PM
    3. In the Register, an application enters the meaningful application name to display to users.4-Nov-29-2022-12-48-30-6663-PM
    4. Once you have registered an application, click on the View API Permissions5-Nov-29-2022-12-50-06-6880-PM
    5. Select Add a permission6-Nov-29-2022-12-55-54-0508-PM
    6.  Select Microsoft Graph7-Nov-29-2022-12-56-47-7671-PM
    7. Select Delegated permissions.8-Nov-29-2022-02-58-54-9469-PM
    8. Select Yes to confirm your choice.9-4
    9. Click on Certificates and secrets from the left-pane10-4
    10. Select the New client secret button.11-Nov-29-2022-03-03-37-1406-PM
    11. Provide a description for the client secret, and the duration for which the client secret will be valid, and click Add.
      12-3
    12. Copy the string under column value. You won't be able to retrieve it after you perform another operation or leave this page.
      13-4
    13. Click Overview on the left pane and copy the client ID.
      14-4
    14. Provide appropriate role for app registry created for Data Factory.  

    Service Account with Minimum Permissions

    An admin/service account for crawling and building lineage. The minimum privileges required are:

    Operation

    Access Permission

    Connection validate

    Read

    Crawl datasets

    Read

    Establish a Connection

    To establish a connection, complete the following steps:

    1. Log into the OvalEdge application, navigate to the Administration module, and click on Connectors.
    2. Click on the + icon (New Connector ) and the Add Connector pop-up is displayed.
    3. Search/click on the desired connector and the Add Connector pop-up with the selected connector details is displayed.
      Azure Data Factory

    Field Name

    Description

    Connector Type

    By default, the selected connection type is displayed as the Azure Data Factory. 

    If required, the drop-down list allows the user to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

    Connector Name*

    Select a connection name for Azure Data Factory. You can specify a connection name to identify the Azure Data Factory connection in OvalEdge.

    Example: Azure Data Factory_test

    Credential Manager

    The purpose of a credential manager is to enhance the security that stores the API keys, passwords, certificates, and other sensitive data securely and helps to manage access, rotates, and audit secrets. 

    OE Credential Manager: Azure Data Factory connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Azure Data Factory database. Users need to add the credentials manually if the OE Credential Manager option is selected.

    HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.

    AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.

    Azure Key Vault: Click here to know about Azure Key Vault.

    For more information on Credential Manager, refer to Credential Manager

    License Add-Ons

    All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a data source. 

    OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

    • Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. 
    • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
    • Select the Data Access Add-On license that will enforce connector access via OvalEdge with the Remote Data Access Management (RDAM) feature enabled. 

    Client Id*

    After registering your application, you'll see the application ID (or client ID) under

    1. Login to your Azure account.
    2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
    3. Click Enterprise applications.
    4. Click All applications.
    5. Select the application that you have created. Click Properties and copy the Application ID.

    Connector Environment

    The environment drop-down list allows you to select the environment configured for the connector from the drop-down list. 

    For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).

    The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).  

    Note: The steps to set up environment variables are explained in the prerequisites section.

    Client Secret*

    The application needs a client secret to prove its identity when requesting a token. For security reasons, Microsoft limits the creation of client secrets longer than 24 months and strongly recommends that you set this to a value less than 12 months.

    1. Login to your azure account.
    2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
    3. Click App registrations.
    4. Select the application which you have created.
    5. Click on All settings.
    6. Click on Keys.
    7. Type the Key description and select the Duration.
    8. Click Save.
    9. Copy and store the key value. You won't be able to retrieve it after you leave this page.

    Tenant Id*

    The tenant ID identifies the Microsoft Entra ID (previously, Azure AD) tenant to use for authentication. It is also referred to as the directory ID

    1. Login to your azure account.
    2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
    3. Click properties.
    4. Copy the directory ID.

    Subscriber Id*

    The subscription ID is a GUID that uniquely identifies your subscription to use Azure services.

    1. Login to your azure account.
    2. Select Subscriptions in the left sidebar.
    3. Select whichever subscription is needed.
    4. Click on overview.
    5. Copy the Subscription ID.

    Resource Group Name*

    Select the resource group based on the need

    Api Version*

    Currently supporting 2018-06-01

    Pipeline Types to crawl

    Select the pipeline type from the drop-down list.

    Default Governance Roles*

    Users can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. 

    Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section.  

    Admin Roles*

    Select the required admin roles for this connector.

    • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
      The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
    • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
      The security and Governance Admin is responsible for:
      • Configure role permissions for the connector and its associated data objects.
      • Add admins to set permissions for roles on the connector and its associated data objects.
      • Update governance roles.
      • Create custom fields.
      • Develop Service Request templates for the connector.
      • Create Approval workflows for the templates.

    Select Bridge

    A solution is required to circumnavigate the customer firewall when OvalEdge is deployed as a SaaS application. That solution is OvalEdge Bridge. A bridge is a type of firewall that operates at the network layer. 

    • When a bridge has been set up, it will be displayed here in a dropdown menu. Users can select the required Bridge ID.
    • The user can select "NO BRIDGE" when it is not configured.

    For more information, refer to Bridge Overview

    Note: * (asterisk) indicates the mandatory field required to establish a connection. Once all the parameters are entered, the user can validate the details and save the connection that will get displayed on the Connector Home page.
    Note: It is up to the user's wish, you can save the connection details first, or you can validate the connection first and then save it.

    1. Click on the Validate button to validate the connection details.
    2. Click on the Save button to save the connection. Alternatively, the user can also directly click on the Save & Configure button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

    Connection Validation Errors

    Error Messages

    Description

    If you fail to establish a connection, Please check the credentials(Client id, Client secret, etc.,)

    In case of an invalid Client id, Client secret.

    Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

    Connection Settings

    Lineage

    For the Azure Data Factory Connector, only the Lineage setting option is available, and it is enabled when the Auto Lineage License option is selected.

    The purpose of the lineage setting is to serve the option of changing the server/source connection to build the lineage. You can configure multiple servers simultaneously in the  Selecting Source Server Type for lineage and also set the connection priority lists to pick the tables in the Connections Priority to pick the source table for lineage building.

    The Crawling of Schema(s)

    You can use the Crawl/Profile option, which allows you to select the specific schemas for the Crawl and Schedule operations: For any scheduled crawlers, the defined run date and time are displayed to set.

    1. Navigate to the Connectors page, and click on the Crawl/Profile option.
    2. By default we are showing all the related objects by pipelines, dataflows etc under one schema that is ADF Schema.
    3. Click on the Run button that gathers all metadata from the connected source into OvalEdge Data Catalog. 

    Note: For more information on Scheduling, refer to Scheduling Connector