Connectivity Summary
An out of the box connector is available for <Hive Kerberos Connector> databases. It provides support for crawling database objects, profiling of sample data and lineage building.
The connectivity to Hive Kerberos Connector is via JDBC driver, which is included in the platform.
The connector currently supports the following versions of SQL Server:
Edition: CDH
Version: 5.0, 6.0, 7.0
The drivers used by the connector are given below:
Driver / API: Hive JDBC uber jar
Version: 7.4
Details: org.apache.hive.jdbc.HiveDriver - Hive JDBC "uber" jar
Technical Specifications
The connector capabilities are shown below:
Crawling
Feature | Supported Objects | Remarks |
Crawling | Tables | |
Table Columns |
Supported Data types: Supporting all types of Data types including nested data type arrays. |
Profiling
Please see Profiling Data for more details on profiling.
Feature |
Support |
Remarks |
Table Profiling |
Row count, Columns count, View sample data |
|
View Profiling |
Row count, Columns count, View sample data |
View is treated as a table for profiling purposes |
Column Profiling |
Min, Max, Null count, distinct, top 50 values |
|
Full Profiling |
Supported |
|
Sample Profiling |
Not Supported |
Lineage Building
Lineage entities |
Details |
Table lineage |
Supported |
Column lineage |
Supported |
Lineage Sources |
Stored procedures, functions, triggers, views, SQL queries (from Query Sheet), query logs and HQL files |
Querying
Operation |
Details |
Select |
Supported |
Insert |
Supported. |
Update |
Supported. |
Delete |
Supported. |
Joins within database |
Supported. |
Joins outside database |
Supported. |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
By default the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert / Update / Delete queries can be executed.
Pre-requisites
To use the connector, the following need to be available:
- Connection details as specified in the following section should be available.
- An admin / service account, for crawling and profiling. The minimum privileges required are:
Operation |
Access Permission |
Connection validate |
Yes |
Crawl schemas |
Yes |
Crawl tables |
Yes |
Profile schemas, tables |
Yes |
Query logs |
Yes |
Get views, procedures, function code |
Yes |
- JDBC driver is provided by default. In case it needs to be changed, add Hive drivers into the OvalEdge Jar path to communicate to SQL Server database.
Check the Configuration section for further details on how to add the drivers to the jar path.
- For Kerberos authentication need krb5.conf and ovaledge.keytab file
- Please find the below document for Krb5.conf file setup
https://docs.google.com/document/d/1VWOazfDAtoeyyDPU47o2Uws9QyvQaNeog1qX36sfrpY/edit
Connection Details
The following connection settings should be added for connecting to a Hive Kerberos database:
- Database Type: Hive
- Authentication: Kerberos Authentication
- Connection Name: Select a Connection name for the Hive Kerberos database. The name that you specify is a reference name to easily identify your Hive Kerberos database connection in OvalEdge. Example: Hive Kerberos
- Server: Hive Kerberos Server IP
- Port number: 10000
- Database: Name of the database to connect.
- Driver: org.apache.hive.jdbc.HiveDriver
- Driver Name: JDBC driver name for org.apache.hive.jdbc.HiveDriver. It will be auto-populated.
Example: org.apache.hive.jdbc.HiveDriver - Connection String: Hive Kerberos connection string. Set the Connection string toggle button to automatic, to get the details automatically from the credentials provided. Alternatively, you can manually enter the string.
Format: jdbc:hive2://{server}:{port}/{sid};principal=hive/undefined
Example: jdbc:hive2://18.220.154.229:10000/default;principal=hive/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL - Keytab: Provide the Key tab path
Once connectivity is established, additional configurations for Crawling and Profiling can be specified.