Azure Databricks

Connectivity Summary

An out-of-the-box connector is available for Azure Databricks. It provides support for crawling datasets i.e ADB Notebooks and lineage building.

1-Sep-27-2021-04-31-18-66-AMThe connectivity to ADBis via REST API,  which is included in the platform.

Technical Specifications

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks

Crawling

Jobs

It fetches all ADB Notebooks from the ADB workspace.

Lineage Building

Lineage entities

Details

Table - Table

Supported

Table - File Lineage

Supported

File - Table Lineage

Supported

Column lineage- File Column Lineage

Supported

Querying 

Operation 

Details

Select

Supported

Insert

Not supported, by default.

Update

Not supported, by default.

Delete

Not supported, by default.

Joins within database

Supported

Joins outside database

Not supported

Aggregations

Supported

Group By

Supported

Pre-Requisites

To use the connector, the following need to be available:

  • Connection details as specified in the following section, should be available.
  • An admin/service account for crawling. The minimum privileges required are:

    Operation 

    Access Permission

    Connection validate

    Should have read permission on the ADB workspace.

Connection Details

The following connection settings should be added for connecting to an ADB:

Property

Details

Database Type

DATABRICKSDB

License Type

Standard or Auto Lineage

Connection Name

Select a Connection name for the ADB. The name you specify is a reference name for easy identification of the ADB connection in OvalEdge.

Example: ADB1

Server

URL for ADB

Database

Provide the valid database name

Driver

-

Username

Provide the valid username

Password/Access Token

Provide valid password

Configure a New ADB connection

  1. Azure Databricks takes Databricks URL, Username, and Password/Access Token to connect and crawl. We can connect using Username & Password or just with Access Token.
    image-Jan-27-2023-01-38-07-6259-PM
  2. The Username and Password are associated with the Databricks Username and Password. To connect with the Access Token, click on the User Settings.
  3. In the User Settings, click on the Generate New Token button.
  4. The Rest APIs are used to authenticate and get the Notebooks from Databricks.

      For information on Rest API, please refer to the /docs.databricks.com/dev-tools/api/latest/index.htmlhttps:/

    1. There will be different types of Objects in the Databricks Workspace. DIRECTORY, NOTEBOOK, and LIBRARY. Other than NOTEBOOK and LIBRARY, all others come considered DIRECTORY. It can be Users, Shared, Folders, etc; all of these contain NOTEBOOKS

    2. As part of crawling, we get the objects from Root(/) in the Workspace, iterate over Users, Folders, Shared, etc and collect all the NOTEBOOKS and get its content.
    3. We are exporting the Notebook content in dbc format to the temp(Administration->Configurations) directory, unzip it, and get the JSON content within the file of dbc.
    4. For each Notebook, we create a dataset entry and corresponding source code entry with the JSON data.
    5. These entries are used to build the Lineage.

    Steps to test the AZURE DATABRICKS

    1. Before crawling the ADB connection, we need to set a path to store the remote data of data bricks.
    2. Path set up : (In configuration ----> ovaledge. tempath------->eg: E:\databricks (local folder path) )
    3. After crawling, the user can see the data in the queries tab. Those are Notebooks.
    4. For Notebooks, we have the option to build the Lineage.
    5. From the crawler page, select the connection and click on the build Lineage button to build lineage automatically; then, the user can see the data the same as the queries tab. In this, we have Nine dots icon; from that we can get two options like build lineage for source code and Build lineage for unprocessed source code
    6. After building lineage, we can see the lineage. The notebook has an association with other commands that we can see in the association tab. 

    Azure Data Bricks supports only SQL object types to build lineage.