The Hive connector enables querying data housed within an Apache Hive data warehouse. Hive comprises three primary components: data files, often stored in diverse formats commonly housed in the Hadoop Distributed File System, or object storage systems like Amazon S3.
The connectivity to Hive Connector is via JDBC driver, which is included in the platform.
Connector Capabilities
The connector capabilities are shown below:
Crawling
Features |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Tables |
- |
Crawling |
Table columns |
Supported Data types: Supporting all types of Data Types Including nested datatypes arrays |
Profiling
Features | Details |
Remarks |
Table Profiling |
Row count, Columns count, View sample data |
- |
View Profiling |
Row count, Columns count, View sample data |
- |
Column Profiling |
Min, Max, Null count, distinct, top 50 values |
- |
Full Profiling |
Supported |
- |
Lineage Building
Lineage entities |
Details |
---|---|
Table Lineage |
Supported |
Column Lineage |
Supported |
Lineage Sources |
Stored procedures, views, SQL queries (from Query Sheet), query logs, and HQL files |
Querying
Operation |
Details |
---|---|
Select |
Supported |
Insert |
Not supported, by default. |
Update |
Not supported, by default. |
Delete |
Not supported, by default. |
Joins within database |
Supported |
Joins outside database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
Prerequisites
The following are prerequisites for connecting to the Hive Connector:
Drivers
Driver/API |
Version |
Details |
7.4 |
org.apache.hive.jdbc.HiveDriver |
Configuring Environment Variables
Configuring environment names enables you to select the appropriate environment from the drop-down list when adding a connector. This allows for consistent crawling of schemas across different environments, such as production (PROD), staging (STG), or temporary environments. It also facilitates schema comparisons and assists in application upgrades by providing a temporary environment that can later be deleted.
Before establishing a connection, it is important to configure the environment names for the specific connector. If your environments have been configured, skip this step.
Steps to Configure the Environment
- Log into the OvalEdge application.
- Navigate to Administration > System Settings.
- Select the Connector tab.
- Find the key name “connector.environment”.
- Enter the desired environment values (PROD, STG) in the Value column.
- Click ✔ to Save.
Service Account Permissions
A service account is required for crawling and profiling. By default, the service account provided for the connector will be used for any query operations. If the service account has a write privilege, insert, update, and delete queries can be executed. The minimum privileges required are listed below.
Operation |
Access Permission |
---|---|
Connection Validation |
SELECT |
Crawling |
Select, Usage |
Profiling |
Read, Select |
Query execution |
Select |
Establish a Connection
To connect to Hive using the OvalEdge application, complete the following steps:
- Log in to the OvalEdge application.
- Navigate to Administration > Connectors.
- Click on the + (New Connector) icon.
- The Add Connector pop-up window is displayed, and you can search for the Hive connector.
- The Add Connector with Connector Type specific details pop-up window is displayed. Enter the relevant information to configure the Hive connection.
Note: An asterisk (*) denotes a mandatory field for establishing a connection.
Hive (Non-Kerberos Authentication)
Field Name
Description
Connector Type
This field allows you to select the connector from the drop-down list provided. By default, 'Hive' is displayed as the selected connector type.
Credential Manager*
Select the option from the drop-down menu where you want to save your credentials:
OE Credential Manager: The Hive connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Hive database. Users must manually add the credentials if the OE Credential Manager option is selected.
HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.
AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.
For more information on Azure Key Vault, refer to Azure Key Vault.
For more information on Credential Manager, refer to Credential Manager.
License Add Ons
All the connectors will have a Base Connector License by default, which allows you to crawl and profile to obtain metadata and statistical information from a data source.
OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
- Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature.
- Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality using DQ Rules/functions, Anomaly detection, Reports, and more.
Connector Name*
The connection name refers to the Hive database connection in the OvalEdge application.
Connector Environment
The Connector Environment drop-down list allows you to select the environment configured for the connector from the drop-down list.
For example, you can select PROD or STG (based on the items configured in the OvalEdge configuration for the connector environment).
The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).
Note: The Configuring Environment Variables section explains setting up environment variables.
Server*
Hive Plugin Server is a mediator project between OvalEdge and Hive Server for communication.
OvalEdge uses Hive Plugin APIs to communicate with the Hive server.
Port*
It is a TCP/IP port to use if the server is not localhost. Port of .NET application (i.e swagger)
Database*
IP address of Hive Server
Driver*
JSON file path and the file extension should be .asdatabase
Username*
Enter the server name if you are running this as a plugin.
Password*
The port number on which the plugin is running.
Connection String
Ex: jdbc:hive2://18.x20.1x4.2xx:10xx0/default;principal=hive/ec2-18-2x0-1x4-2x9.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL
Default Governance Roles
Steward*
Select the Steward from the drop-down list options.
Custodian*
Select the Custodian from the drop-down list options.
Owner*
Select the Owner from the drop-down list options.
Governance Roles 4, 5, 6*
Select the respective user from the drop-down options.
Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security > Governance Roles section.
Admin Roles
Integration Admins*
To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, then click the Apply button.
The Integration Admin's responsibilities include configuring crawling and profiling settings for the connector and deleting connectors, schemas, or data objects.Security and Governance Admins*
To add Security and Governance Admin roles, search for or select one or more roles from the list and then click on the Apply button.
The Security and Governance Admin is responsible for:- Configuring role permissions for the connector and its associated data objects.
- Adding admins to set permissions for the connector's roles and associated data objects.
- Updating governance roles.
- Creating custom fields.
- Developing Service Request templates for the connector.
- Creating approval workflows for Service Request templates.
No. of Archive Objects*
The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive.
Select Bridge
With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data source(s) without modifying firewall rules. A bridge provides real-time control, making data movement between source and destination easy. For more information, refer to
Hive (Kerberos Authentication)
Field Name
Description
Principal
A Principal is a unique identity. It can be a user or a service.
Ex: host/corp.example.com@CORP.EXAMPLE.COM
Keytab
Keytab is a file that contains key value pair, where key is the principle and value is a encrypted value of Kerberos principal.
Kbr5-Configuration File*
This file contains the Kerberos configuration information.
- After entering all the required connection details, select the appropriate option based on your preferences:
- Validate: Click the Validate button to verify the connection details. This ensures that the provided information is accurate and enables successful connection establishment.
- Save: Click on the Save button to store the connection details. Once saved, the connection will be added to the Connectors home page for easy access.
- Save & Configure: For certain Connectors requiring additional configuration settings, click the Save & Configure button. This will open the Connection Settings pop-up window, allowing you to configure the necessary settings before saving the connection.
- Once the connection is validated and saved, it will be displayed on the Connectors home page.
Connection Validation Details
S.No |
Error Message(s) |
Description |
1 |
Connection Time Out |
The browser could not establish a connection to the server in time. |
2 |
The file path does not exist |
When the JSON file path is wrongly entered. |
Note: If you have issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.
Connector Settings
Once the connection is successfully established, various settings are provided to fetch and analyze the information from the data source.
The connection settings include Crawler, Profiler, Query Policies, Access Instruction, Business Glossary Settings, and Notification.
To view the Connector Settings page,
- Go to the Connectors page.
- From the 9- dots, select the Settings option.
- This will display the Connector Settings page, where you can view all the connector settings.
- When you have finished making your desired changes, click on Save Changes. All setting changes will be applied to the metadata.
The following is a list of connection settings and their corresponding descriptions.
Connection Settings |
Description |
Crawler |
Crawler settings are configured to connect to a data source and collect and catalog all the data elements in metadata. |
Profiler |
Profiler settings govern the gathering of statistics and informative summaries about the connected data source(s). These statistics can help assess the quality of data sources before using them for analysis. Profiling is always optional; crawling can be run without profiling. |
Query Policies |
A query policy enforces security by preventing users with specific roles from performing certain query functions on the data source. |
Access Instruction |
Access Instruction allows the data owner to instruct others on using the objects in the application. |
Business Glossary Settings |
The Business Glossary Settings provide flexibility and control over how users view and manage term association within a business glossary at the connector level. |
Others |
The Enable/Disable Metadata Change Notifications option sets the change notification about metadata changes of the data objects.
|
Note: For more information, refer to the Connector Settings.
Crawling of Schema(s)
The Crawl/Profile option allows you to select the schema for the following operations:
Crawl, Crawl & Profile, Profile, or Profile Unprofiled. Under the Action section, the defined run date and time are displayed for any scheduled crawlers and profilers.
- Navigate to the Connectors page and click on the Crawl/Profile button. Select Important Schema For Crawling and Profiling pop-up window is displayed.
- Select the schema.
- The list of actions below is displayed in the Action section.
- Crawl: This allows the selected schema(s) metadata to be crawled.
- Crawl & Profile: This allows the metadata of the selected schema(s) and profiles of the sample data to be crawled.
- Profile: This allows the collection of table column statistics.
- Profile Unprofiled: This allows data that has not been profiled to be profiled.
- Schedule: Connectors can also be scheduled in advance to run crawling and/or profiling at prescribed times and selected intervals.
Note: For more information on Scheduling, refer to Scheduling Connector.
- Click on the Run button. This gathers all metadata from the connected source and puts it into the OvalEdge Data Catalog.
Additional Information
Parameters |
Description |
---|---|
Security |
Krb5.conf file should be configured in the JVM incase of the Kerberos authentication. |
Copyright © 2024, OvalEdge LLC, Peachtree Corners, GA, USA.