Azure Data Lake is extensively scalable and secure storage that performs all types of processing and analytics across platforms. It can store structured, semi-structured, and unstructured data seamlessly.
In the OvalEdge application, the Azure Data Lake connector allows you to crawl and sample profile the files or folders existing in the Azure Data Lake instance.
Prerequisites
The following are prerequisites for connecting to the Azure Data Lake.
The APIs/ drivers used by the connector are given below:
Sl.No |
Driver / API |
Details |
---|---|---|
1 |
API |
The connectivity to Azure Data Lake is via ADL, a common library included in the platform. |
User Permission
By default, the service account provided for the connector will be used for any user operations. The minimum privileges required are:
Operation |
Access Permission |
---|---|
Connection Validation |
Read |
Crawl File/Folders |
Read |
Catalog Files/Folders |
Read |
Profile Files/Folders |
Read |
Technical Specification
The connector capabilities are shown below:
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Data Storage Containers |
While crawling root Files/Folders, by default all the folder and files existing in that specific root path will be cataloged |
Profiling
Features |
Supported Objects |
Details |
---|---|---|
File Profiling |
Row Count, Columns Count, View Sample Data |
Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC |
Sample Profiling |
Supported |
- |
Connection Details
To connect to the Azure Data Lake using the OvalEdge application, complete the following steps.
- log in to the OvalEdge application
- Navigate to Administration > Crawler module.
- click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed.
-
Select the connection type as Azure Data Lake. The Manage Connection with Azure Data Lake specific details pop-up window is displayed.
Field Name |
Mandatory/Optional |
Description |
---|---|---|
Connection Type |
Mandatory |
By default, the selected connection type is displayed as the Azure Data Lake. |
License Type |
Mandatory |
You can choose the License Type as Standard. |
Connection Name |
Mandatory |
Enter the connection name specified in the Connection Name textbox will be a reference to the Azure Data Lake database connection in the OvalEdge application. |
ADL Connection String |
Mandatory |
Enter the connection string which was generated at the Azure storage account. Ex:DefaultEndpointsProtocol=https;AccountName=ovaledgefileaccess;AccountKey=... |
Default Governance Roles |
Mandatory |
From the dropdown list, select Stewards, Custodian and Owner. |
Select Bridge |
Optional |
Select option NO Bridge if no bridge is available for the connector |
Connection String
The connection is generated at the Azure Storage account under the Access Key module. By default, the string is automatically generated and displayed in the Connection String field.
Copy the string from the storage account and paste it into the Manage Connection - ADL Connection String field.
Connection Settings
Crawler
Sl.No |
Property |
Description |
1 |
Crawler Options |
FileFolders/Buckets by default enabled |
2 |
Crawler Rules |
Include and exclude regex for FileFolders and Buckets only but not for files |
Profiler
Sl.No |
Property |
Description |
1 |
Profile Options |
No Existence for Profile |
2 |
Profile Rules |
No Profile rule exist |
Points to note:
- Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC
- Only shows the details of File/Folder in FileManager which user has access to Files/FIleFolder