Azure Data Lake is extensively scalable and secure storage that performs all types of processing and analytics across platforms. It can store structured, semi-structured, and unstructured data seamlessly.
In the OvalEdge application, the Azure Data Lake connector allows you to crawl and sample profile the files or folders existing in the Azure Data Lake instance.
Prerequisites
The following are prerequisites for connecting to the Azure Data Lake.
The APIs/ drivers used by the connector are given below:
Sl.No |
Driver / API |
Details |
---|---|---|
1 |
API |
The connectivity to Azure Data Lake is via ADL, a common library included in the platform. |
Server User Permission
By default, the service account provided for the connector will be used for any user operations. The minimum privileges required are:
Operation |
Access Permission |
---|---|
Connection Validation |
Read |
Crawl File/Folders |
Read |
Catalog Files/Folders |
Read |
Profile Files/Folders |
Read |
Technical Specification
The connector capabilities are shown below:
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Data Storage Containers |
While crawling root Files/Folders, by default all the folder and files existing in that specific root path will be cataloged |
Profiling
Features |
Supported Objects |
Details |
---|---|---|
File Profiling |
Row Count, Columns Count, View Sample Data |
Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC |
Sample Profiling |
Supported |
- |
Connection Details
To connect to the Azure Data Lake using the OvalEdge application, complete the following steps:
- Log in to the OvalEdge application.
- Navigate to Administration > Connectors module.
- Click on the + icon, and the Add Connection with Search Connector pop-up window is displayed.
-
Select the connection type as Azure Data Lake. The Add Connector with Azure Data Lake specific details pop-up window is displayed.
Field Name |
Description |
||
Connector Type |
By default, the selected connection type is displayed as the Azure Data Lake. |
||
Credential Manager |
Select the option from the drop-down menu, where you want to save your credentials: OE Credential Manager: Azure Data Lake connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Azure Data Lake database. Users need to add the credentials manually if the OE Credential Manager option is selected. HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge. AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge. For more information Azure Key Vault, refer to Azure Key Vault For more information on Credential Manager, refer to Credential Manager |
||
License Add Ons |
All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
|
||
Connector Environment |
The environment drop-down menu allows you to select the environment configured for the connector from the drop-down list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment). The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA). Note: The steps to set up environment variables are explained in the prerequisite section. |
||
Connector Name* |
Enter the connector name specified in the Connector Name text box. It will be a reference to the Azure Data Lake database connection in the OvalEdge application. |
||
Authentication Type |
The Authentication Type drop-down list allows you to select either ADL String or ADL Service Principal. ADL String:
ADL Service Principal:
|
||
Default Governance Roles* |
Users can select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section. |
||
Admin Roles* |
Select the required admin roles for this connector.
|
||
No of Archive Objects* |
The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive. |
||
Select Bridge |
Select option NO Bridge if no bridge is available for the connector. |
Note:
Connection String
The connection is generated at the Azure Storage account under the Access Key module. By default, the string is automatically generated and displayed in the Connection String field.
Copy the string from the storage account and paste it into the Manage Connection - ADL Connection String field.
Connection Settings
Crawler
Sl.No |
Property |
Description |
1 |
Crawler Options |
FileFolders/Buckets by default enabled |
2 |
Crawler Rules |
Include and exclude regex for FileFolders and Buckets only but not for files |
Profiler
Sl.No |
Property |
Description |
1 |
Profile Options |
No Existence for Profile |
2 |
Profile Rules |
No Profile rule exist |
Points to note:
- Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC.
- Only shows the details of File/Folder in FileManager which user has access to Files/FIleFolder.