Integrations

Kafka

Kafka is an open-source component that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems. 

OvalEdge uses an Apache Kafka Client Jar to connect to the data source, which allows you to crawl and profile the Topics and Messages.

Kafka_arch

Connector Capabilities

The connector capabilities are shown below:

Crawling

Features

Supported Objects

Remarks

Crawling

Tables

-

Profiling

Features

Details

Remarks

Table Profiling

Min, Max, Null count, distinct, and top 50 values

-

Column Profiling

Min, Max, Null count, distinct, top 50 values

-

Sample Profiling

Supported

Profiling is supported for Avro and JSON schema types, and

not supported for protobuf.

 

Notes: 

     Topics in Kafka are represented as tables.

     Messages in Kafka are represented as columns and crawled during sample profiling.

Prerequisite(s)

The following are the prerequisites to establish a connection to Apache kafka.

  1. Kafka Client Jar
  2. Service Account Permissions
  3. Configure environment variables (Optional)

Kafka Client Jar

OvalEdge uses Kafka Client Jar to connect to Apache kafka.

Jars

Version

Details

Apache Kafka Jars 

5.5.1-css

org.apache.kafka.clients

Apache Kafka Jars

3.2.0

org. apache.kafka

Apache Kafka Jars

5.5.1

io. confluent. Kafka-schema-registry-client

Service Account with Minimum Read Permissions

The following are the minimum privileges required for a service account to crawl and profile the data.

Operation

Minimum Access Permissions

Connection Validation

When a Schema registry URL is given, the Schame Registry user and password validation should be valid. 

If the Boostrap server and JAAS.Config file is valid, and the path of this JAAS.The config should be valid. 

Crawling 

Read access to the given cluster.

Profiling

Read access to the topic messages(Tables).

Configure environment variables (Optional)

This section describes the settings or instructions that you should be aware of prior to establishing a connection. If your environments have been configured, skip this step.

Configure Environment Names

The Environment Names allow you to select the environment configured for the specific connector from the dropdown list in the Add Connector pop-up window.
You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments. Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance. 

Steps to Configure the Environment 

  1. Navigate to Administration > System Settings.
  2. Select the Connector tab.
  3. Find the Key name “connector.environment”.
  4. Enter the desired environment values (PROD, STG) in the value column.
  5. Click ✔ to save. 

Establish a connection.

To connect to Apache kafka using the OvalEdge application, complete the following steps:

  1. Log in to the OvalEdge application
  2. Navigate to Administration > Connectors module.
  3. Click on the + icon, and the Add Connector with Search Connector pop-up window is displayed.
  4. Select the connection type as Apache kafka. The Add Connector with Apache kafka details pop-up window is displayed.

Fields

Details

Connector Type

This field allows you to select the connector from the drop-down list provided. By default, 'Kafka' is displayed as the selected connector type.

Credential Manager*

Select the option from the drop-down menu where you want to save your credentials:

OE Credential Manager: The Kafka connection is configured with the basic Username and Password of the service account in real time when OvalEdge establishes a connection to the Kafka database. Users need to add the credentials manually if the OE Credential Manager option is selected.

HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.  

AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.

For more information on Azure Key Vault, refer to Azure Key Vault.

For more information on Credential Manager, refer to Credential Manager.

License Ad-Ons

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements. 

  • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.

Connector Environment

The Connector Environment drop-down list allows you to select the environment configured for the connector from the drop-down list. 

For example, you can select PROD or STG (based on the items configured in the OvalEdge configuration for the connector environment).

The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).

 Note: The Configuring Environment Variables section explains setting up environment variables.

Connection Name*

Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Apache Kafka connection in the OvalEdge application.

Example: Apache kafka Connection1

Broker URL*

To get Broker URL from the Kafka Server, go to the Section:  Additional Information 

Database instance URL (on-premises/cloud-based)
Example: oval-Kafka.csklygkwz3dx.us-east-1.rds.amazonaws.com

Cluster Name*

Enter cluster name.

Example: demo-kafka-cluster-12938

For more information, refer to the Section:  Additional Information 

Cluster Authentication Typle

It supports the following types of Authentication:

  • Confluent Without Auth
  • JAAS Config path
  • App/Secret Key Credential

If you select Confluent Without Auth, enter Registry URL. 

The other authentication details are explained here. 

Registry URL

Enter the Schema Registry service URL with port.

Example: http://schema-registry.example.com:8081

Default Governance Roles*

You can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. 

Note: The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section. 

Admin Roles

Select the required admin roles for this connector.
  • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button. 
    The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
  • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button. 
    The security and Governance Admin is responsible for:
    • Configure role permissions for the connector and its associated data objects.
    • Add admins to set permissions for roles on the connector and its associated data objects.
    • Update governance roles.
    • Create custom fields.
    • Develop Service Request templates for the connector.
    • Create Approval workflows for the templates.

Select Bridge

With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes it easy to manage data movement between any source and destination. For more information, refer to Bridge Overview.

For more information, refer to Bridge Overview

Additional Information on Authentication Types:

Below are the additional fields required to establish the connection for ‘JAAS Config Path’ and ‘App Key/Secret Key Credentials’.

Cluster Authentication Type as “JAAS Config Path”:

The Java Authentication and Authorization Service (JAAS) login configuration file contains one or more entries that specify the authentication mechanism to be used by applications. 

Enter the following details to establish a Kafka connection using the JAAS Config path authentication.

Field Name

Details

Cluster Authentication Type*

Select Cluster Authentication Type as “JAAS Config Path” from the drop-down list.

JAAS Config Path*

This file includes a login module to authenticate the cluster(username and password)

Security Protocol

SASL_SSL

SASL Mechanism

PLAIN

Registry URL

Enter the Schema Registry service URL with port.


Example: http://schema-registry.example.com:8081

Schema Registry User

User account or API key that has access permissions to interact with the Schema Registry service.

Schema Registry Password

Secret key of the above user account or API key.

KRB5 Config Path

It indicates the location of the Kerberos configuration file within the path of the OvalEdge running instance.

Cluster Authentication Type as “App Key/Secret Key Credentials”

API Keys are used to control the access of a data source. Each API Key consists of a key and a secret.

Enter the following details to establish a Kafka connection using the App/Secret Key Credentials authentication.

Field Name

Details

Cluster Authentication Type*

Select Cluster Authentication Type as “App Key/Secret Key Credentials” from the drop-down list.

App Key

Enter the cluster API Key.

For detailed steps, click here.

Secret Key

Enter the cluster API Secret

For detailed steps, click here.

Security Protocol

SASL_SSL

SASL Mechanism

PLAIN

Registry URL

Enter the Schema Registry service URL with port.


Example: http://schema-registry.example.com:8081

Schema Registry User

User account or API key that has access permissions to interact with the Schema Registry service.

Schema Registry Password

Secret key of the above user account or API key.

KRB5 Config Path

It indicates the location of the Kerberos configuration file within the path of the OvalEdge running instance.

5. Click on the Validate button to validate the connection details.

6. Click on the Save button to save the connection.  Alternatively, you can also directly click on the button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required. 

Note: * (asterisk) indicates the mandatory field required to create a connection. Once the connection is validated and saved, it will be displayed on the Connectors home page. 

Note: You can either save the connection details first, or you can validate the connection first and then save it. 

Connection Validation Errors 

S.No

Error Message(s)

Description

1

Failed to construct Kafka consumer 

If any one of the parameters from connector arguments is missing or given invalid, it will throw this error

2

Unable to obtain the krb5 or keytab

If we didn’t configure the keytab in VM arguments or the details which exist in the key tab are wrong it will throw the message

3

Timeout error

Connection timeout while connecting to schema registry

4

401 error

Unauthorized error due to the invalid details of schema registry

Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Connector Settings 

Once the connection is established successfully, various settings are provided to fetch and analyze the information from the data source. 

The connection settings include Crawler, Profiler, Access Instruction, Business Glossary Settings, and Others.

To view the Connector Settings page,

  1. Go to the Connectors page.
  2. From the 9- dots, select the Settings option.
  3. This will display the Connector Settings page, where you can view all the connector settings.
  4. When you have finished making your desired changes, click on Save Changes. All setting changes will be applied to the metadata.
    The following is a list of connection settings and their corresponding descriptions.

Connection Settings

Description

Crawler

Crawler settings are configured to connect to a data source and collect and catalog all the data elements in metadata.

Profiler




Profiler settings govern the gathering of statistics and informative summaries about the connected data source(s). These statistics can help assess the quality of data sources before using them for analysis. Profiling is always optional; crawling can be run without profiling. 

Access Instruction

Access Instruction allows the data owner to instruct others on using the objects in the application.

Business Glossary Settings

The Business Glossary Settings provide flexibility and control over how users view and manage term association within a business glossary at the connector level.

Others

The Send Metadata Changes Notifications option is used to set the change notification about the metadata changes of the data objects.

  • You can use the toggle button to set the Default Governance Roles (Steward, Owner Custodian, etc.) 
  • From the drop-down menu, you can select the role and team to receive the notification of metadata changes.

For more information, refer to the Connector Settings.

Crawling of Schema(s)

You can use the Crawl/Profile option, which allows you to select the specific schemas for the following operations: crawl, profile, crawl & profile, or profile unprofiled. For any scheduled crawlers and profilers, the defined run date and time are displayed to set. 
  1. Navigate to the Connectors page, and click on the Crawl/Profile option.
    Select Schemas to Crawl and Profile pop-up window is displayed.
  2. Select the required Schema(s).
  3. The list of actions below is displayed in the Action section.
    1. Crawl:  This allows the selected schema(s) metadata to be crawled.
    2. Crawl & Profile: This allows the metadata of the selected schema(s) and profiles of the sample data to be crawled.
    3. Profile: This allows the collection of table column statistics.
    4. Profile Unprofiled: This allows data that has not been profiled to be profiled.
    5. Schedule: Connectors can also be scheduled in advance to run crawling and/or profiling at prescribed times and selected intervals.
      Note: For more information on Scheduling, refer to Scheduling Connector.
  4. Click on the Run button that gathers all metadata from the connected source into OvalEdge Data Catalog.

Note: For more information on Scheduling, refer to Scheduling Connector

Additional Information

This section provides details about generating a few connection parameters and the available authentication details.

Cluster Name

A Kafka cluster is a system that consists of several Brokers, Topics, and Partitions for both. Connecting to any broker means connecting to the entire cluster. 
Step(s): 

  1. Log in to Apache Kafka.
  2. Click on the 'Environment', and 
  3. Select the Cluster name (demo-Kafka-cluster-12933). 

cluster

Broker URL (Boostrap Server)

In Apache Kafka, the server is referred to as a broker, and it is responsible for managing the storage and exchange of messages between producers and consumers. Each broker in the cluster stores a subset of the topic partitions. 

Step(s): 

  1. You can navigate to Cluster Overview and click on Cluster Settings. 
  2. In the Endpoints section, select Bootstrap server URL information.

endpoint-1

APP Key and Secret Key

You can navigate to the respective Cluster name, then click on API Keys. 
You can select an existing API Key or generate a new service account API Key. 

Steps to generate a new API key

  1. Navigate to Cluster Overview > API Keys. 
  2. Click on + Add Key to generate a new API Key. 
    APIkey
  3. Select Granular access as the scope for API Key to managing it as Service Account. 
    createkey
  4. Click on the Next button. 
  5. Enter the New service account name and Description.
    description
  6. Click on the Next button. 
  7. Select appropriate permission at  Cluster level and Topic level permission.
    permissions
  8. Click on the Next button
    keyandsecret-3
  9. Copy the Key and Secret information. 

Configuring Cluster Authentication Type

JAAS Config path: The Java Authentication and Authorization Service (JAAS) login configuration file contains one or more entries that specify authentication technologies to be used by applications. 

configJAAS

App/Secret Key Credential: 

API Keys are used to controlling the access of a datasource. Each API Key consists of a key and a secret. 

APKey



Copyright © 2024, OvalEdge LLC, Peachtree Corners, GA, USA.