RDBMS

Delta Lake Unity Catalog

Delta Lake is an open-source storage layer that combines features of traditional data lakes and data warehouses. It integrates seamlessly with existing data lake storage solutions such as S3, ADLS, GCS, or HDFS, offering strong capabilities for dependable data management and streamlined data processing.

OvalEdge utilizes a JDBC Driver to connect to Delta Lake, enabling functionalities like data crawling, profiling, query execution, and lineage building. Within the OvalEdge Delta Lake integration, two distinct Database Types are supported:

  1. Delta Lake Regular: This type retrieves all schemas directly from the remote data source.
  2. Delta Lake Unity Catalog: This type retrieves tables from nested schemas within the remote data source.

These options cater to different organizational needs, providing flexibility in how data schemas and structures are accessed and managed within the Delta Lake environment.

Important: This document focuses on specific information relevant to this particular connector. Details about Establishing Connections, Connector Parameters, and Connector Settings will be found in the "Generic Features of Connectors" document.

Connector Characteristics

Connector Category

RDBMS

Connectivity

JDBC

Connector Version

[OvalEdge Connector Version Information]

7.0

Delta Lake Source Versions Supported 

[Data Source Version from which OvalEdge started providing support]

All Versions

OvalEdge Releases Supported (Available from)

[OvalEdge Release Version in which the connector was included in the OvalEdge Application]

4.x Onwards

Supported Features

Crawling of Metadata Objects 

Tables, Table Columns, Views, Stored Procedures, 

Functions, Roles, Users, Permissions

Supported Datatypes:

int, tinyint, smallint, medium, bigint, float, double, decimal, date, datetime, timestamp, time, year, char, varchar

Metadata Source

Information Schema and Delta Lake Commands

Profiling

Table Profiling: Row count, Column count, and View sample data

View Profiling: Row count,   Columns count, View sample data

Column Profiling: Min, Max, Null count, distinct, top 50 values

Types of Profiling: 

  • Sample Profiling 
  • Full Profiling

Crawling of Query Logs

Supported

Data Preview

Supported

Data Lineage

  • Auto Lineage via parsing Stored Procedures, Functions & Views
  • Auto Lineage via parsing Query Logs
  • Lineage Levels
    • Table Lineage
    • Column Lineage

Supported

Data Quality Module

More info: Data Quality

Supported

DAM (Data Access Management) Support

More info: RDAM

Not Supported 

Query Sheet Support

Supported

Bridge Support

[The Bridge component is used as a Tunnel to Transfer Data from the Client-Hosted Cloud or Client-on-Premises Data Center. The handshake between the Bridge Server and Client is established with a Security Tool Kit (these differ from client to client and are generated based on the IP address/DNS, etc.).]

Supported

Crawl of Usage Statistics (Source System)

[This involves systematically collecting, analyzing, and reporting on data related to user activities, including details on which users are accessing the system, their actions, and the frequency of their access.]

Not Supported 

Certifications at Source (Source System)
[This refers to endorsing metadata objects applied within the source system. These certifications are then fetched and displayed in OvalEdge, enabling more in-depth analysis.]

Not Supported

Prerequisites 

The following are the prerequisites required for establishing a connection:  

  1. API Details
    Add Delta Lake Connector drivers into the OvalEdge Jar Path (/home/ovaledge/jarpath) to communicate with the Delta Lake database.

    API

    Version

    Details

    JDBC 

    V4.0.0 and above

    The OvalEdge Dependencies automatically include the  Microsoft JDBC Driver, ensuring that the Connector comes equipped with it by default.


  2. Service Account with Minimum Permissions

    Create a Delta Lake service account to connect to OvalEdge with the following permissions to crawl metadata into OvalEdge. 

    Operation

    Minimum Access Permission 

    Connection Validation

    SELECT

    Crawl Schemas, Tables

    SELECT

    Profile Schemas, Tables

    SELECT

    Lineage Building 

    SELECT

  3. Configure Environment Variables (Optional).
This section describes the settings or instructions you should know before establishing a connection. If your environments have been configured, skip this step.
For more information, refer to the "Generic Features of Connectors" document.

Establish Connection

In the OvalEdge application, the Delta Lake connector allows you to crawl the buckets and file data objects using Credential Manager Authentication.

The "Generic Features of Connectors" document covers general settings and parameters common to most OvalEdge connectors. Here, we'll focus on the specific details required to configure the Delta Lake Connector.

Delta Lake-Specific Parameters

Field Name

Description

Server*

Provide the server address where the database instance is hosted. This address is essential for connecting to the database.

Example: adb-123456789.11.azuredatalake.net

Port*

Enter the port number. The default port for Delta Lake when connecting to a server is usually "443"

Database_Type*

Select the database type as Delta Lake_Unity_Catalog. 

  • Delta Lake_Regular: Includes the default catalog folder containing schemas.
  • Delta Lake_Unity_Catalog: Includes multiple catalog folders with nested schemas.

Database

Enter the name of the database associated with the selected Database Type.

Driver*

A JDBC driver is a Java library file with the extension .jar that connects to a database. The driver details associated with the Delta Lake database will be auto-populated by default.

Example: com.simba.spark.jdbc.Driver

HTTP Path

Enter the HTTP Path associated with Delta Lake, and it helps in connecting with the legacy-specific cluster or with the warehouse

Example: sql/protocolv1/o/781181XXXXXXX/0717-094118-bathe927

Lineage Fetching Mode

Choose the mode for retrieving and displaying lineage details in OvalEdge by selecting either Query or API. 

If the user selects the Query option, then they need to give SELECT access to the system.access schema.

Username* 

Provide the service account username required to connect to the Delta Lake server.

Note: The web browser may auto-fill this field with the current OvalEdge user login. Please enter the Delta Lake Service Account name if necessary.

Password*

Enter the service account password to gain access to the Delta Lake Server. In general, security measures involve token-based authentication for enhanced protection. 

Important: * (asterisk) indicates the mandatory field to create a connection. 

After entering the required parameters, you can either save the connection details first or validate the connection and then save it.

Errors & Resolution

You may encounter the following errors if the correct parameters are not provided while establishing the connection. Below are the most common error messages and their resolutions for your reference. 

If you continue to experience issues with establishing the connection, please contact your assigned GCS team.

S.No.

Error Message(s)

Error Description/Resolution

1

Connection timed out.

Error Description: 

The client could not reach the server within the specified time.

Resolution:

  • Verify the server address and port number.
  • Ensure network connectivity between the client and server.
  • Check if the server is running and accepting connections.
  • Increase the timeout settings in the connection configuration.

2

Authentication failed. Invalid username or password.

Error Description: 

Incorrect authentication credentials.

Resolution:

  • Verify the username and password.
  • Ensure the credentials have the necessary permissions to access the Delta Lake instance.
  • Check for any updates or changes in the authentication method (e.g., OAuth, Kerberos).

3

Permission denied.

Error Description: 

The user does not have the required permissions to access the database or perform the requested operation.

Resolution:

  • Verify user permissions and roles.
  • Ensure the user has the privileges to access the required schemas and tables.
  • Check the permission settings in Delta Lake and adjust them as needed.

4

Invalid JDBC URL.

Error Description: 

The JDBC URL is incorrectly formatted or contains invalid parameters.

Resolution:

  • Verify the format of the JDBC URL.
  • Ensure all required parameters (server address, port, database name) are correctly specified.
  • Refer to the documentation for the correct JDBC URL syntax.

5

JDBC driver not found.

Error Description:

The JDBC driver for Delta Lake is not available in the client environment.

Resolution:

  • Ensure the JDBC driver is correctly installed and included in the client’s classpath.
  • Verify that the correct version of the JDBC driver is being used.
  • Download and install the appropriate JDBC driver if it is missing.

6

Resource limits exceeded.

Error Description: 

The operation exceeds the resource limits (e.g., memory, CPU).

Resolution:

  • Optimize the query or operation to use fewer resources.
  • Increase the resource limits on the server if possible.
  • Check for any configuration settings that limit resource usage.

7

Version incompatibility.

Error Description: 

The client and server versions are not compatible.

Resolution:

  • Ensure that the client and server are running compatible versions of Delta Lake.
  • Update the client or server to a compatible version if needed.

Connector Settings

After successfully validating the connection, you can access various settings to retrieve specific information from the data source.

The following are settings specific to Delta Lake

  • Crawler
  • Profiler
  • Query Policies
  • Access Instruction
  • Business Glossary Settings
  • Anomaly Detection Settings
  • Others

Note: In the Crawler Setting tab, the user need to check "Procedures, Functions, Triggers & Views Source Code" for fetching Source code for lineage.

For more information, refer to the "Generic Features of Connectors" document.

Limitations

S.No.

Description

1

Firewalls may block the JDBC connection to the Delta Lake server.

2

Incorrect permissions can prevent access to the Delta Lake tables.

3

Older versions of Delta Lake may not support all features or optimizations.

4

Incorrect JDBC URL format or parameters can prevent establishing a connection.

5

Complex or unoptimized queries can result in high resource usage and long execution times.

6

The connection might time out if the server takes too long to respond.

7

Not all BI tools or data platforms may fully support Delta Lake via JDBC.

FAQs

  • What is the default port for Delta Lake connections using JDBC?

Answer: The default port for Delta Lake connections using JDBC is typically "443" when using SSL/TLS.

  • What should I include in the JDBC connection string?

Answer: The JDBC connection string should include the server address, port, and database name. It may also include additional parameters for authentication and SSL configuration. 

Example: jdbc:spark://adb-123456789.11.azuredatabricks.net:443/default.

  • How do I handle authentication for a JDBC connection to Delta Lake?

Answer: Authentication can be handled using username and password, tokens, or other methods supported by your Delta Lake deployment. Ensure you provide the correct credentials in the connection properties.

  • Why am I getting a connection timeout error?

Answer: A connection timeout error can occur due to network issues, incorrect server address or port, firewall restrictions, or server unavailability. Verify the connection details and ensure network connectivity.

  • What can I do if I encounter an SSL/TLS handshake failure?

Answer: Ensure that the server's SSL/TLS certificates are valid and trusted by the client. Verify the SSL/TLS configuration on both the client and server sides. If using self-signed certificates, add them to the client's trusted certificate store.

  • How can I resolve an authentication failure?

Answer: Double-check the username and password or other authentication credentials. Ensure the user has the necessary permissions to access the Delta Lake instance. Also, verify the authentication method being used.

  • What should I do if I receive a "permission denied" error?

Answer: Ensure that your user account has the appropriate permissions to access the required schemas and tables in Delta Lake. Check the access control settings and roles assigned to your user.

  • Why is my JDBC URL invalid?

Answer: Ensure the JDBC URL is correctly formatted and includes all necessary parameters. For the correct JDBC URL syntax, refer to the Delta Lake documentation.

  • How do I ensure compatibility between the JDBC driver and Delta Lake?

Answer: Use a JDBC driver version that is compatible with your Delta Lake version. For compatibility information, refer to the documentation or release notes. Updating the JDBC driver and Delta Lake to their latest versions can help resolve compatibility issues.

  • How do I troubleshoot limited or unclear error messages?

Answer: Enable detailed logging in your JDBC driver and Delta Lake environment. Review the logs for more information on the error. Additionally, consult the documentation and community forums for similar issues and solutions.

  • Can I connect to Delta Lake from BI tools from JDBC?

Answer: Yes, many BI tools support JDBC connections to Delta Lake. Ensure that the tool is configured correctly with the JDBC connection string, credentials, and any necessary drivers.

  • What do I do if my connection is frequently dropped?

Answer: Frequent connection drops can be due to network instability, server resource limitations, or configuration issues. Ensure stable network connectivity adequate server resources, and review any connection timeout settings.