Delta Lake is an open-source storage layer that combines features of traditional data lakes and data warehouses. It integrates seamlessly with existing data lake storage solutions such as S3, ADLS, GCS, or HDFS, offering strong capabilities for dependable data management and streamlined data processing.
OvalEdge utilizes a JDBC Driver to connect to Delta Lake, enabling functionalities like data crawling, profiling, query execution, and lineage building. Within the OvalEdge Delta Lake integration, two distinct Database Types are supported:
- Delta Lake Regular: This type retrieves all schemas directly from the remote data source.
- Delta Lake Unity Catalog: This type retrieves tables from nested schemas within the remote data source.
These options cater to different organizational needs, providing flexibility in how data schemas and structures are accessed and managed within the Delta Lake environment.
Important: This document focuses on specific information relevant to this particular connector. Details about Establishing Connections, Connector Parameters, and Connector Settings will be found in the "Generic Features of Connectors" document.
Connector Characteristics
Connector Category |
RDBMS |
Connectivity |
JDBC |
Connector Version [OvalEdge Connector Version Information] |
7.0 |
Delta Lake Source Versions Supported [Data Source Version from which OvalEdge started providing support] |
All Versions |
OvalEdge Releases Supported (Available from) [OvalEdge Release Version in which the connector was included in the OvalEdge Application] |
4.x Onwards |
Supported Features
Crawling of Metadata Objects |
Tables, Table Columns, Views, Stored Procedures, Functions, Roles, Users, Permissions Supported Datatypes: int, tinyint, smallint, medium, bigint, float, double, decimal, date, datetime, timestamp, time, year, char, varchar |
Metadata Source |
Information Schema and Delta Lake Commands |
Profiling |
Table Profiling: Row count, Column count, and View sample data View Profiling: Row count, Columns count, View sample data Column Profiling: Min, Max, Null count, distinct, top 50 values Types of Profiling:
|
Crawling of Query Logs |
Supported |
Data Preview |
Supported |
Data Lineage
|
Supported |
Data Quality Module More info: Data Quality |
Supported |
DAM (Data Access Management) Support More info: RDAM |
Not Supported |
Query Sheet Support |
Supported |
Bridge Support [The Bridge component is used as a Tunnel to Transfer Data from the Client-Hosted Cloud or Client-on-Premises Data Center. The handshake between the Bridge Server and Client is established with a Security Tool Kit (these differ from client to client and are generated based on the IP address/DNS, etc.).] |
Supported |
Crawl of Usage Statistics (Source System) [This involves systematically collecting, analyzing, and reporting on data related to user activities, including details on which users are accessing the system, their actions, and the frequency of their access.] |
Not Supported |
Certifications at Source (Source System) |
Not Supported |
Prerequisites
The following are the prerequisites required for establishing a connection:
- API Details
Add Delta Lake Connector drivers into the OvalEdge Jar Path (/home/ovaledge/jarpath) to communicate with the Delta Lake database.
API
Version
Details
JDBC
V4.0.0 and above
The OvalEdge Dependencies automatically include the Microsoft JDBC Driver, ensuring that the Connector comes equipped with it by default.
- Service Account with Minimum Permissions
Create a Delta Lake service account to connect to OvalEdge with the following permissions to crawl metadata into OvalEdge.
Operation
Minimum Access Permission
Connection Validation
SELECT
Crawl Schemas, Tables
SELECT
Profile Schemas, Tables
SELECT
Lineage Building
SELECT
- Configure Environment Variables (Optional).
For more information, refer to the "Generic Features of Connectors" document.
Establish Connection
In the OvalEdge application, the Delta Lake connector allows you to crawl the buckets and file data objects using Credential Manager Authentication.
The "Generic Features of Connectors" document covers general settings and parameters common to most OvalEdge connectors. Here, we'll focus on the specific details required to configure the Delta Lake Connector.
Delta Lake-Specific Parameters
Field Name |
Description |
Server* |
Provide the server address where the database instance is hosted. This address is essential for connecting to the database. Example: adb-123456789.11.azuredatalake.net |
Port* |
Enter the port number. The default port for Delta Lake when connecting to a server is usually "443" |
Database_Type* |
Select the database type as Delta Lake_Unity_Catalog.
|
Database |
Enter the name of the database associated with the selected Database Type. |
Driver* |
A JDBC driver is a Java library file with the extension .jar that connects to a database. The driver details associated with the Delta Lake database will be auto-populated by default. Example: com.simba.spark.jdbc.Driver |
HTTP Path |
Enter the HTTP Path associated with Delta Lake, and it helps in connecting with the legacy-specific cluster or with the warehouse Example: sql/protocolv1/o/781181XXXXXXX/0717-094118-bathe927 |
Lineage Fetching Mode |
Choose the mode for retrieving and displaying lineage details in OvalEdge by selecting either Query or API. If the user selects the Query option, then they need to give SELECT access to the system.access schema. |
Username* |
Provide the service account username required to connect to the Delta Lake server. Note: The web browser may auto-fill this field with the current OvalEdge user login. Please enter the Delta Lake Service Account name if necessary. |
Password* |
Enter the service account password to gain access to the Delta Lake Server. In general, security measures involve token-based authentication for enhanced protection. |
Important: * (asterisk) indicates the mandatory field to create a connection.
After entering the required parameters, you can either save the connection details first or validate the connection and then save it.
Errors & Resolution
You may encounter the following errors if the correct parameters are not provided while establishing the connection. Below are the most common error messages and their resolutions for your reference.
If you continue to experience issues with establishing the connection, please contact your assigned GCS team.
S.No. |
Error Message(s) |
Error Description/Resolution |
1 |
Connection timed out. |
Error Description: The client could not reach the server within the specified time. Resolution:
|
2 |
Authentication failed. Invalid username or password. |
Error Description: Incorrect authentication credentials. Resolution:
|
3 |
Permission denied. |
Error Description: The user does not have the required permissions to access the database or perform the requested operation. Resolution:
|
4 |
Invalid JDBC URL. |
Error Description: The JDBC URL is incorrectly formatted or contains invalid parameters. Resolution:
|
5 |
JDBC driver not found. |
Error Description: The JDBC driver for Delta Lake is not available in the client environment. Resolution:
|
6 |
Resource limits exceeded. |
Error Description: The operation exceeds the resource limits (e.g., memory, CPU). Resolution:
|
7 |
Version incompatibility. |
Error Description: The client and server versions are not compatible. Resolution:
|
Connector Settings
After successfully validating the connection, you can access various settings to retrieve specific information from the data source.
The following are settings specific to Delta Lake
- Crawler
- Profiler
- Query Policies
- Access Instruction
- Business Glossary Settings
- Anomaly Detection Settings
- Others
Note: In the Crawler Setting tab, the user need to check "Procedures, Functions, Triggers & Views Source Code" for fetching Source code for lineage.
For more information, refer to the "Generic Features of Connectors" document.
Limitations
S.No. |
Description |
1 |
Firewalls may block the JDBC connection to the Delta Lake server. |
2 |
Incorrect permissions can prevent access to the Delta Lake tables. |
3 |
Older versions of Delta Lake may not support all features or optimizations. |
4 |
Incorrect JDBC URL format or parameters can prevent establishing a connection. |
5 |
Complex or unoptimized queries can result in high resource usage and long execution times. |
6 |
The connection might time out if the server takes too long to respond. |
7 |
Not all BI tools or data platforms may fully support Delta Lake via JDBC. |
FAQs
- What is the default port for Delta Lake connections using JDBC?
Answer: The default port for Delta Lake connections using JDBC is typically "443" when using SSL/TLS.
- What should I include in the JDBC connection string?
Answer: The JDBC connection string should include the server address, port, and database name. It may also include additional parameters for authentication and SSL configuration.
Example: jdbc:spark://adb-123456789.11.azuredatabricks.net:443/default.
- How do I handle authentication for a JDBC connection to Delta Lake?
Answer: Authentication can be handled using username and password, tokens, or other methods supported by your Delta Lake deployment. Ensure you provide the correct credentials in the connection properties.
- Why am I getting a connection timeout error?
Answer: A connection timeout error can occur due to network issues, incorrect server address or port, firewall restrictions, or server unavailability. Verify the connection details and ensure network connectivity.
- What can I do if I encounter an SSL/TLS handshake failure?
Answer: Ensure that the server's SSL/TLS certificates are valid and trusted by the client. Verify the SSL/TLS configuration on both the client and server sides. If using self-signed certificates, add them to the client's trusted certificate store.
- How can I resolve an authentication failure?
Answer: Double-check the username and password or other authentication credentials. Ensure the user has the necessary permissions to access the Delta Lake instance. Also, verify the authentication method being used.
- What should I do if I receive a "permission denied" error?
Answer: Ensure that your user account has the appropriate permissions to access the required schemas and tables in Delta Lake. Check the access control settings and roles assigned to your user.
- Why is my JDBC URL invalid?
Answer: Ensure the JDBC URL is correctly formatted and includes all necessary parameters. For the correct JDBC URL syntax, refer to the Delta Lake documentation.
- How do I ensure compatibility between the JDBC driver and Delta Lake?
Answer: Use a JDBC driver version that is compatible with your Delta Lake version. For compatibility information, refer to the documentation or release notes. Updating the JDBC driver and Delta Lake to their latest versions can help resolve compatibility issues.
- How do I troubleshoot limited or unclear error messages?
Answer: Enable detailed logging in your JDBC driver and Delta Lake environment. Review the logs for more information on the error. Additionally, consult the documentation and community forums for similar issues and solutions.
- Can I connect to Delta Lake from BI tools from JDBC?
Answer: Yes, many BI tools support JDBC connections to Delta Lake. Ensure that the tool is configured correctly with the JDBC connection string, credentials, and any necessary drivers.
- What do I do if my connection is frequently dropped?
Answer: Frequent connection drops can be due to network instability, server resource limitations, or configuration issues. Ensure stable network connectivity adequate server resources, and review any connection timeout settings.