File

Azure Data Lake

Azure Data Lake is extensively scalable and secure storage that performs all types of processing and analytics across platforms. It can store structured, semi-structured, and unstructured data seamlessly. 

In the OvalEdge application, the Azure Data Lake connector allows you to crawl and sample profile the files or folders existing in the Azure Data Lake instance.

ADL

Prerequisites

The following are prerequisites for connecting to the Azure Data Lake. 

The APIs/ drivers used by the connector are given below:

Sl.No

Driver / API

Details

1

API

The connectivity to  Azure Data Lake is via ADL, a common library included in the platform.

Server User Permission

By default, the service account provided for the connector will be used for any user operations. The minimum privileges required are:

Operation

Access Permission

Connection Validation

Read

Crawl File/Folders

Read

Catalog Files/Folders

Read

Profile Files/Folders 

Read

Technical Specification

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks

Crawling

Data Storage Containers

While crawling root Files/Folders, by default all the folder and files existing in that specific root path will be cataloged 

Profiling

Features

Supported Objects

Details

File Profiling

Row Count, Columns Count, View Sample Data

Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC

Sample Profiling

Supported

-

Connection Details

To connect to the Azure Data Lake using the OvalEdge application, complete the following steps:

  1. Log in to the OvalEdge application.
  2. Navigate to Administration > Connectors module.
  3. Click on the + icon, and the Add Connection with Search Connector pop-up window is displayed. 
  4. Select the connection type as Azure Data Lake. The Add Connector with Azure Data Lake specific details pop-up window is displayed. 

Field Name

Description

Connector Type

By default, the selected connection type is displayed as the Azure Data Lake

Credential Manager

Select the option from the drop-down menu, where you want to save your credentials:

OE Credential Manager: Azure Data Lake connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Azure Data Lake database. Users need to add the credentials manually if the OE Credential Manager option is selected.

HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.  

AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.

For more information Azure Key Vault, refer to Azure Key Vault

For more information on Credential Manager, refer to Credential Manager

License Add Ons

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

  • Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. 
  • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
  • Select the Data Access Add-On license that will enforce connector access via OvalEdge with Remote Data Access Management (RDAM) feature-enabled.

Connector Environment

The environment drop-down menu allows you to select the environment configured for the connector from the drop-down list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).

The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).

Note: The steps to set up environment variables are explained in the prerequisite section.

Connector Name*

Enter the connector name specified in the Connector Name text box. It will be a reference to the Azure Data Lake database connection in the OvalEdge application.

Authentication Type

The Authentication Type drop-down list allows you to select either ADL String or ADL Service Principal.


ADL String: 

  • ADL Connection String* : Enter the connection string which was generated at the Azure storage account.
    Ex:DefaultEndpointsProtocol=https;AccountName=ovaledgefileaccess;AccountKey=...

ADL Service Principal:

  • Client Id*: 
    After you've registered your application, you'll see the application ID (or client ID) under
    1. Login to your Azure account.
    2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
    3. Click Enterprise applications.
    4. Click All applications.
    5. Select the application which you have created.
    6. Click Properties.
    7. Copy the Application ID.
    •  Client Secret*:
      The application needs a client secret to prove its identity when requesting a token. For security reasons, Microsoft limits the creation of client secrets longer than 24 months and strongly recommends that you set this to a value less than 12 months.
      1. Login to your azure account.
      2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
      3. Click App registrations.
      4. Select the application which you have created.
      5. Click on All settings.
      6. Click on Keys.
      7. Type Key description and select the Duration.
      8. Click Save.
      9. Copy and store the key value. You won't be able to retrieve it after you leave this page.
      • Tenant Id*:
        The tenant ID identifies the Microsoft Entra ID (previously, Azure Active Directory) tenant to use for authentication. It is also referred to as the directory ID
        1. Login to your azure account.
        2. Select the Microsoft Entra ID (previously, Azure Active Directory) in the left sidebar.
        3. Click properties.
        4. Copy the directory ID.
        • ADL Endpoint*: URL used to interact with ADL storage accounts.

        Default Governance Roles*

        Users can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset.


        Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section.

        Admin Roles*

        Select the required admin roles for this connector.

        • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
          The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
        • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
          The security and Governance Admin is responsible for:
          • Configure role permissions for the connector and its associated data objects.
          • Add admins to set permissions for roles on the connector and its associated data objects.
          • Update governance roles.
          • Create custom fields.
          • Develop Service Request templates for the connector.
          • Create Approval workflows for the templates.

        No of Archive Objects*

        The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive.

        Select Bridge

        Select option NO Bridge if no bridge is available for the connector.

        Note:

        Connection String

        The connection is generated at the Azure Storage account under the Access Key module. By default, the string is automatically generated and displayed in the Connection String field.

        Copy the string from the storage account and paste it into the Manage Connection - ADL Connection String field.

        3-Nov-08-2022-02-31-42-2565-PM

        Connection Settings

        Crawler

        Sl.No

        Property

        Description

        1

        Crawler Options

        FileFolders/Buckets by default enabled

        2

        Crawler Rules

        Include and exclude regex for FileFolders and Buckets only but not for files

         Profiler 

        Sl.No

        Property

        Description

        1

        Profile Options

        No Existence for Profile

        2

        Profile Rules

        No Profile rule exist

        Points to note:

        1. Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC.
        2. Only shows the details of File/Folder in FileManager which user has access to Files/FIleFolder.