Azure Data Lake

Azure Data Lake is extensively scalable and secure storage that performs all types of processing and analytics across platforms. It can store structured, semi-structured, and unstructured data seamlessly. 

In the OvalEdge application, the Azure Data Lake connector allows you to crawl and sample profile the files or folders existing in the Azure Data Lake instance.

ADL

Prerequisites

The following are prerequisites for connecting to the Azure Data Lake. 

The APIs/ drivers used by the connector are given below:

Sl.No

Driver / API

Details

1

API

The connectivity to  Azure Data Lake is via ADL, a common library included in the platform.

User Permission

By default, the service account provided for the connector will be used for any user operations. The minimum privileges required are:

Operation

Access Permission

Connection Validation

Read

Crawl File/Folders

Read

Catalog Files/Folders

Read

Profile Files/Folders 

Read

 

Technical Specification

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks

Crawling

Data Storage Containers

While crawling root Files/Folders, by default all the folder and files existing in that specific root path will be cataloged 

Profiling

Features

Supported Objects

Details

File Profiling

Row Count, Columns Count, View Sample Data

Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC

Sample Profiling

Supported

-

Connection Details

To connect to the Azure Data Lake using the OvalEdge application, complete the following steps.

  1. log in to the OvalEdge application
  2. Navigate to Administration > Crawler module.
  3. click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed. 
  4. Select the connection type as Azure Data Lake. The Manage Connection with Azure Data Lake specific details pop-up window is displayed.

    2-Nov-08-2022-02-27-13-8432-PM

Field Name

Mandatory/Optional

Description

Connection Type

Mandatory 

By default, the selected connection type is displayed as the Azure Data Lake

License Type

Mandatory

You can choose the License Type as Standard.

Connection Name

Mandatory

Enter the connection name specified in the Connection Name textbox will be a reference to the Azure Data Lake database connection in the OvalEdge application.

ADL Connection String

Mandatory

Enter the connection string which was generated at the Azure storage account.

Ex:DefaultEndpointsProtocol=https;AccountName=ovaledgefileaccess;AccountKey=...

Default Governance Roles

Mandatory

From the dropdown list, select Stewards, Custodian and Owner.

Select Bridge

Optional 

Select option NO Bridge if no bridge is available for the connector 

 

Connection String

The connection is generated at the Azure Storage account under the Access Key module. By default, the string is automatically generated and displayed in the Connection String field.

Copy the string from the storage account and paste it into the Manage Connection - ADL Connection String field.

3-Nov-08-2022-02-31-42-2565-PM

Connection Settings

Crawler

Sl.No

Property

Description

1

Crawler Options

FileFolders/Buckets by default enabled

2

Crawler Rules

Include and exclude regex for FileFolders and Buckets only but not for files

 Profiler 

Sl.No

Property

Description

1

Profile Options

No Existence for Profile

2

Profile Rules

No Profile rule exist

Points to note:

  1. Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC
  2. Only shows the details of File/Folder in FileManager which user has access to Files/FIleFolder