Connectivity Summary
A data warehouse connector over various databases of Impala. It provides support for crawling database objects, profiling of sample data and lineage building.
The connectivity to Impala is via Impala JDBC driver, which is included in the platform. The connector currently supports the following versions of Impala:
Edition: Apache Impala
Version: 1.1.x and above
The drivers used by the connector are given below:
Driver / API: Impala JDBC Driver
Version: 1.1.x and above (latest version is 2.6.15)
Details: Download Impala JDBC Connector 2.6.15 (Kerberos Authentication)
Technical Specifications
The connector capabilities are shown below:
Crawling
Supported objects for Crawling are: Tables and Table Columns.
All data types in Impala are supported.
Profiling
Please see Profiling Data for more details on profiling.
Feature | Supported Objects |
Table Profiling |
Row count, Columns count, View sample data |
Column Profiling |
Min, Max, Null count, distinct, top 50 values |
Full Profiling |
Supported |
Sample Profiling |
Supported |
Querying
Operation | Details |
Select |
Supported |
Insert |
Not supported, by default. |
Update |
Not supported, by default. |
Delete |
Not supported, by default. |
Joins within database |
Supported |
Joins outside database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
By default the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert / Update / Delete queries can be executed.
Pre-requisites
To use the connector, the following need to be available:
- Connection details as specified in the following section should be available.
- An admin / service account, for crawling and profiling. The minimum privileges required are:
Operation |
Access Permission |
Connection validate |
READ |
Crawl schemas |
READ |
Crawl tables |
READ |
Profile schemas, tables |
READ |
Connection Details
The following connection settings should be added for connecting to a IMPALA database:
Kerberos Connection Details
Non - Kerberos Connection Details
- Database Type: IMPALA
- Authentication:
Kerberos Authentication Authentication is performed based on Kerberos, Keytab and Kerberos principal provided. Non Kerberos Authentication Must provide SA account credentials (Username / Password). - Connection Name: Select a Connection name for the IMPALA database. The name that you specify is a reference name to easily identify your Impala database connection in OvalEdge. Example: Impala Connection DB1
- Hostname / IP Address: Database instance IP address
Example: 18.220.154.229 - Port number: 21050
- Sid / Database: Name of the database to connect.
- Username: User account login credential (only for Non-Kerberos Authentication)
- Password: Password (only for Non-Kerberos Authentication)
- Driver Name: JDBC driver name for Impala. It will be auto-populated.
Example: org.apache.hive.jdbc.HiveDriver (Kerberos Authentication)
- Connection String:
- Kerberos connection string : Set the Connection string toggle button to automatic, to get the details automatically from the credentials provided, if you are using service principal right away. Alternatively, you can manually enter the string in case you are using user principal.
Format: jdbc:hive2://{server}:21050/{sid};principal=impala/{principal}
Example: jdbc:hive2://18.220.154.229:21050/default;principal=impala/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL
-
Non-Kerberos connection string : Set the Connection string toggle button to automatic, to get the details automatically from the credentials provided. Alternatively, you can manually enter the string.
Format : jdbc:impala://{server}:21050/{sid}
Example: jdbc:impala://18.220.154.229:21050/default
- Kerberos connection string : Set the Connection string toggle button to automatic, to get the details automatically from the credentials provided, if you are using service principal right away. Alternatively, you can manually enter the string in case you are using user principal.
Once connectivity is established, additional configurations for Crawling and Profiling can be specified.
Property |
Details |
Crawler configurations |
|
Tables, views and Columns |
If the checkbox is selected it will crawl the tables and columns in impala |
Include Table Regex |
It catalogs the table based on the regex search pattern added. |
Exclude Table Regex |
It doesn’t catalogs the table based on the regex search pattern added. |
Profiler Settings |
|
Profile Type |
Auto - Full profiling will be performed if the rowcount of the table is less than the row count field. Sample - Sample profiling will be performed based on sample profile size Disabled - profiling is disabled completely |
No. of threads |
No of threads used for profiling |
Query TimeOut |
It is wait time for query response |
Rowcount constraint |
If it is checked the profiling performed based the rowcount limit |
Profile rules |
It will include/ exclude the tables to perform profiling based on the regex search pattern added. |
Points to note
- No procedures, functions, views and triggers exist for the Impala connector.
- Lineage is not supported for Impala Connector.
- Setup the Kerberos configuration in tomcat if using Kerberos authentication for Impala
In the tomcat bin folder Create/ Edit the setenv.bat (setenv.sh for Linux boxes) to configure the krb5.conf file of the respective connection with below line:
Windows:
set CATALINA_OPTS=-Djava.security.krb5.conf="<path to krb5.conf file where application is running>\krb5.conf"
Linux:
export CATALINA_OPTS=-Djava.security.krb5.conf="<Path to krb5.conf file>/krb5.conf"
FAQs
1. How much does the driver cost?
The Impala JDBC Driver for IMPALA is available at no additional charge.
2. Can I use the driver to access Impala from a Linux computer?
Yes! You can use the driver to access Impala from Linux, Unix, and other non-Windows platforms.
3. Which authentication types are supported by the Impala JDBC Driver for Impala ?
The table below lists available authentication options.
Platform | Authentication |
Non-Windows |
Kerberos, NON- Kerberos authentication |
Windows |
Kerberos, NON- Kerberos authentication |