Connectivity Summary
An out-of-the-box connector is available for Azure Databricks. It provides support for crawling datasets i.e ADB Notebooks and lineage building.
The connectivity to ADBis via REST API, which is included in the platform.
Technical Specifications
The connector capabilities are shown below:
Crawling
Feature |
Supported Objects |
Remarks |
Crawling |
Jobs |
It fetches all ADB Notebooks from the ADB workspace. |
Lineage Building
Lineage entities |
Details |
Table - Table |
Supported |
Table - File Lineage |
Supported |
File - Table Lineage |
Supported |
Column lineage- File Column Lineage |
Supported |
Querying
Operation |
Details |
Select |
Supported |
Insert |
Not supported, by default. |
Update |
Not supported, by default. |
Delete |
Not supported, by default. |
Joins within database |
Supported |
Joins outside database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Pre-Requisites
To use the connector, the following need to be available:
- Connection details as specified in the following section, should be available.
- An admin/service account for crawling. The minimum privileges required are:
Operation
Access Permission
Connection validate
Should have read permission on the ADB workspace.
Connection Details
The following connection settings should be added for connecting to an ADB:
Property |
Details |
Database Type |
DATABRICKSDB |
License Type |
Standard or Auto Lineage |
Connection Name |
Select a Connection name for the ADB. The name you specify is a reference name for easy identification of the ADB connection in OvalEdge. Example: ADB1 |
Server |
URL for ADB |
Database |
Provide the valid database name |
Driver |
- |
Username |
Provide the valid username |
Password/Access Token |
Provide valid password |
Configure a New ADB connection
- Azure Databricks takes Databricks URL, Username, and Password/Access Token to connect and crawl. We can connect using Username & Password or just with Access Token.
- The Username and Password are associated with the Databricks Username and Password. To connect with the Access Token, click on the User Settings.
- In the User Settings, click on the Generate New Token button.
- The Rest APIs are used to authenticate and get the Notebooks from Databricks.
For information on Rest API, please refer to the /docs.databricks.com/dev-tools/api/latest/index.htmlhttps:/
- There will be different types of Objects in the Databricks Workspace. DIRECTORY, NOTEBOOK, and LIBRARY. Other than NOTEBOOK and LIBRARY, all others come considered DIRECTORY. It can be Users, Shared, Folders, etc; all of these contain NOTEBOOKS
- As part of crawling, we get the objects from Root(/) in the Workspace, iterate over Users, Folders, Shared, etc and collect all the NOTEBOOKS and get its content.
- We are exporting the Notebook content in dbc format to the temp(Administration->Configurations) directory, unzip it, and get the JSON content within the file of dbc.
- For each Notebook, we create a dataset entry and corresponding source code entry with the JSON data.
- These entries are used to build the Lineage.
Steps to test the AZURE DATABRICKS
- Before crawling the ADB connection, we need to set a path to store the remote data of data bricks.
- Path set up : (In configuration ----> ovaledge. tempath------->eg: E:\databricks (local folder path) )
- After crawling, the user can see the data in the queries tab. Those are Notebooks.
- For Notebooks, we have the option to build the Lineage.
- From the crawler page, select the connection and click on the build Lineage button to build lineage automatically; then, the user can see the data the same as the queries tab. In this, we have Nine dots icon; from that we can get two options like build lineage for source code and Build lineage for unprocessed source code
- After building lineage, we can see the lineage. The notebook has an association with other commands that we can see in the association tab.
Azure Data Bricks supports only SQL object types to build lineage.