HBase

Connectivity Summary

An out-of-the-box connector is available for HBase databases to support crawling database objects and profiling sample data.

The drivers used by the connector are mentioned below:

  1. HBase Driver:
    • Version: 2.2.3 (Latest version is 2.4.4)
    • Details: https://mvnrepository.com/artifact/org.apache.hbase/hbase/

  2. HBase Client Driver:
    • Version: 2.2.3 (Latest version is 2.4.4)
    • Details: https://mvnrepository.com/artifact/org.apache.hbase/hbase-client

Technical Specifications

The following are the connector capabilities mentioned below:

Crawling

Feature Supported Objects Remarks
Crawling Tables  
Table columns

Supported Data types: 

Date_Time, Number, and String

Profiling

See this article Profile Data to know more about Profiling.

Feature Support
Table Profiling Row count, Columns count, View sample data
Column Profiling Min, Max, Null count, distinct, top 50 values
Full Profiling Supported
Sample Profiling Supported
 

Pre-requisites

To use the connector, the following need to be available:

  • Connection details as specified in the following section should be available.
  • An admin/service account for crawling and profiling requires the following minimum privileges:
    Operation Access Permission
    Connection validate R (Read access)
    Crawl schemas R (Read access)
    Crawl tables R (Read access)
    Profile schemas, tables R (Read access)
    Query logs N/A
    Get views, procedures, function code N/A

Connection Details

The following are the connection settings that need to be added for connecting to Hbase database:

Kerberos Authentication

HbaseC1

Non-Kerberos Authentication: (REST API)

HbaseC2

  • Database Type: HBase
  • Authentication:
Kerberos Authentication The user Client can be authenticated using a Kerberos file and a principal provided.
Non-Kerberos Authentication No Authentication is needed if the server is up and running, we just need to provide a server.

Kerberos

  • Connection Name: Select a Connection name for the Hbase Server database. The name that you specify is a reference name to easily identify your Hbase Server database connection in OvalEdge.
    Example: Hbase Connection DB1
  • Zookeeper Host Quorum: Zookeeper Cluster URL (on-premises/cloud-based)
    Example: 18.220.154.229
  • Zookeeper port: 2181 (might get changed)
  • HBase Master: Master Server IP with Port

    Example: 18.220.154.229:60000

  • Kerberos keytab: Kerb file along with path
    Example: D://hbase_configs//chakri.keytab
  • Kerberos principal: Example: chakri/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-.COMPUTE.INTERNAL
  • Zookeeper parent node: /hbase (might get changed)
  • Master server principal: Example: hbase/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL
  • Region server principal: hbase/ec2-18-220-154-229.us-east-2.compute.amazonaws.com@US-EAST-2.COMPUTE.INTERNAL

Non-Kerberos (REST API)

  • Hbase Rest Server: Name or IP of the server on which the DB server is running
  • Hbase Rest Server Port: Port number on which the server is running
    Example: 20550

Once connectivity is established, additional configurations for Crawling and Profiling can be specified.

Property Details
Crawler configurations
Order Priority of the rule
Start time and End time Used when crawling/profiling are to be scheduled
No. of threads No. of threads used to perform profiling
Profiler Configurations
Profile Type Disabled/Auto/Sample
Row count Constraint No. of rows to be fetched
Sample profile size Sample profile row limit
Sample data count  
Query Timeout Time to wait for the response
Crawler Options Only Tables can be crawled
Crawler rules Only table columns include and Exclude regex can be used.

Points to note

  • We cannot have queries for HBase.
  • In crawler rules, we won't be using include and exclude regex functionalities for Views, Functions, and Procedures and they are not present in HBase.

FAQs

  1. Where can we find the driver?
    A driver can be found from the below  links:
    a.  HBase Driver: https://mvnrepository.com/artifact/org.apache.hbase/hbase/
    b.  HBase Client Driver: https://mvnrepository.com/artifact/org.apache.hbase/hbase-client