Integrations

Apache Kafka Connector

Kafka is an open-source component that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems. 

OvalEdge uses an Apache Kafka Client Jar to connect to the data source, which allows you to crawl and profile the Topics and Messages.

Kafka_arch

Connector Capabilities

Functionality

Supported Objects

Crawling Tables, Table Columns
Profiling

Table Profiling (Min, Max, Null Count, Distinct Count, Top 50 values)

Column Profiling, Sample Profiling

- Topics in Kafka will be represented as tables.
- Messages in Kafka will be represented as Columns and will be crawled while Sample profiling.

Prerequisite(s)

The following are the prerequisites to establish a connection to Apache kafka.

  1. Kafka Client Jar
  2. Service Account Permissions
  3. Configure environment variables (Optional)

Kafka Client Jar

OvalEdge uses Kafka Client Jar to connect to Apache kafka.

Jars

Version

Details

Apache Kafka Jars 

5.5.1-css

org.apache.kafka.clients

Apache Kafka Jars

3.2.0

org. apache.kafka

Apache Kafka Jars

5.5.1

io. confluent. Kafka-schema-registry-client

Service Account with Minimum Read Permissions

The following are the minimum privileges required for a service account to crawl and profile the data.

Operation

Minimum Access Permissions

Connection Validation

When a Schema registry URL is given, the Schame Registry user and password validation should be valid. 

If the Boostrap server and JAAS.Config file is valid, and the path of this JAAS.The config should be valid. 

Crawling 

Read access to the given cluster.

Profiling

Read access to the topic messages(Tables).

Configure environment variables (Optional)

This section describes the settings or instructions that you should be aware of prior to establishing a connection. If your environments have been configured, skip this step.

Configure Environment Names

The Environment Names allow you to select the environment configured for the specific connector from the dropdown list in the Add Connector pop-up window.
You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments. Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance. 

Steps to Configure the Environment 

1. Navigate to Administration | Configuration 

2. Select the Connector tab.

3. Find the Key name “connector.environment”.

4. Enter the desired environment values (PROD, STG) in the value column. 

5. Click ✔ to save. 

Establish a connection.

To connect to Apache kafka using the OvalEdge application, complete the following steps:

  1. Log in to the OvalEdge application
  2. Navigate to Administration > Connectors module.
  3. Click on the + icon, and the Add Connector with Search Connector pop-up window is displayed.
  4. Select the connection type as Apache kafka. The Add Connector with Apache kafka details pop-up window is displayed.

Fields

Details

Connection Type*

The selected connection type Kafka is displayed by default.

If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

License Ad-Ons*

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

  • Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. 
  • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
  • Select the Data Access Add-On license that will enforce connector access via OvalEdge with Remote Data Access Management (RDAM) feature enabled

Connection Name*

Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Apache Kafka connection in the OvalEdge application.

Example: Apache kafka Connection1

Broker URL

To get Broker URL from the Kafka Server, go to the Section:  Additional Information 

Database instance URL (on-premises/cloud-based)
Example: oval-Kafka.csklygkwz3dx.us-east-1.rds.amazonaws.com

Cluster Name

demo-kafka-cluster-12938. 

For more information, refer to the Section:  Additional Information 

Cluster Authentication Typle

It supports the following types of Authentication as explained in the section Additional Information > Configuring Cluster Authentication Typle. 

1. JAAS Config path: The Java Authentication and Authorization Service (JAAS) login configuration file contains a username (APP Key) and password(Secret Key) for authentication. 

Note: You are provided with a JAAS Config file (plain authentication). You will open this file to edit usernames and passwords and save them to their local machine. Then this local path is noted to enter in the “JAAS Config Path” field parameter. 

2. App/Secret Key Credential: Alternatively, you can avoid the JAAS Config Path and use the APP Key and Secret Key parameters. 

Note: Please refer to Section:  Additional Information  

APP Key

Enter the APP Key generated in Section APP Key and Secret Key

42pq6weov14sv4bwnoprop

Secret Key 

Enter the Secret Key generated in Section APP Key and Secret Key

wJalrXUtnFEMI/K7OyuUWUIG/bPYRfiCYEXAMPLEKEY

KRB5 Config

It is an optional parameter for Kerberos authentication, then you need to provide the Path URL. The krb5.conf file contains Kerberos configuration information, including the location of KDCs and Kerberos realms and the mapping of the Hostname into Kerberos realms. 

Security Protocol 

It is an encryption mechanism such as SASL_SSL. 

Enter the parameter as “SASL_SSL”

SASL Mechanism

Simple Authentication and Security Layer (SASL) for adding authentication support to connection-based protocols. 

Enter the parameter as “PLAIN”. 

Environment*

The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG.

The purpose of the environment field is to help you to understand that the new connector is established in an environment available at the  Production, STG, and QA.

Note: The steps to set up environment variables in explained in the prerequisite section.

Default Governance Roles*

You can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. 

Note: The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section. 

Admin Roles

Select the required admin roles for this connector.
  • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button. 
    The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
  • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button. 
    The security and Governance Admin is responsible for:
    • Configure role permissions for the connector and its associated data objects.
    • Add admins to set permissions for roles on the connector and its associated data objects.
    • Update governance roles.
    • Create custom fields.
    • Develop Service Request templates for the connector.
    • Create Approval workflows for the templates.

Select Bridge

With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes it easy to manage data movement between any source and destination. For more information, refer to Bridge Overview.

For more information, refer to Bridge Overview

5. Click on the Validate button to validate the connection details.

6. Click on the Save button to save the connection.  Alternatively, you can also directly click on the button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required. 

Note: * (asterisk) indicates the mandatory field required to create a connection. Once the connection is validated and saved, it will be displayed on the Connectors home page. 

Note: You can either save the connection details first, or you can validate the connection first and then save it. 

Connection Validation Errors 

S.No.

Error Message(s)

Description

1

Failed to Construct kafka consumer 

If anyone of the parameter from connector arguments is missing or given invalid, it will throw this error

2

Unable to obtain the krb5 or keytab

If it didn’t configure the keytab in VM arguments or the details which exist in the keytab are wrong, it will throw the message.

Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Connector Settings 

Once the connection is established successfully, various settings are provided to fetch and analyze the information from the data source.  

The connection settings include Crawler, Profiler, Access Instruction, and Others.

Connection Settings

Description

Crawler

Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata. Check out the crawler options to set the crawler's behavior in the  Crawler & Profiler Settings.

Profiler




Data profiling typically involves collecting statistics about data sources such as:  
 Minimum value
 Maximum value
 Top 50 values
 Distinct count
 Null count

Access Instruction

Access Instruction allows the data owner to instruct others on using the objects in the application.

Others

The Send Metadata Changes Notifications option is used to set the change notification about the metadata changes of the data objects.

  • You can use the toggle button to set the Default Governance Roles (Steward, Owner Custodian, etc.) 
  • From the drop-down menu, you can select the role and team to receive the notification of metadata changes.

For more information, refer to the Connector Settings.

The Crawling of Schema(s)

You can use the Crawl/Profile option, which allows you to select the specific schemas for the following operations: crawl, profile, crawl & profile, or profile unprofiled. For any scheduled crawlers and profilers, the defined run date and time are displayed to set. 
  1. Navigate to the Connectors page, and click on the Crawl/Profile option.
  2. Select the required Schema(s).
  3. Click on the Run button that gathers all metadata from the connected source into OvalEdge Data Catalog.

Note: For more information on Scheduling, refer to Scheduling Connector

Additional Information

This section provides details about generating a few connection parameters and the available authentication details.

Cluster Name

A Kafka cluster is a system that consists of several Brokers, Topics, and Partitions for both. Connecting to any broker means connecting to the entire cluster. 
Step(s): 

  1. Log in to Apache Kafka.
  2. Click on the 'Environment', and 
  3. Select the Cluster name (demo-Kafka-cluster-12933). 

cluster

Broker URL (Boostrap Server)

In Apache Kafka, the server is referred to as a broker, and it is responsible for managing the storage and exchange of messages between producers and consumers. Each broker in the cluster stores a subset of the topic partitions. 

Step(s): 

  1. You can navigate to Cluster Overview and click on Cluster Settings. 
  2. In the Endpoints section, select Bootstrap server URL information.

endpoint-1

APP Key and Secret Key

You can navigate to the respective Cluster name, then click on API Keys. 
You can select an existing API Key or generate a new service account API Key. 

Steps to generate a new API key

  1. Navigate to Cluster Overview > API Keys. 
  2. Click on + Add Key to generate a new API Key. 
    APIkey
  3. Select Granular access as the scope for API Key to managing it as Service Account. 
    createkey
  4. Click on the Next button. 
  5. Enter the New service account name and Description.
    description
  6. Click on the Next button. 
  7. Select appropriate permission at  Cluster level and Topic level permission.
    permissions
  8. Click on the Next button
    keyandsecret-3
  9. Copy the Key and Secret information. 

Configuring Cluster Authentication Type

JAAS Config path: The Java Authentication and Authorization Service (JAAS) login configuration file contains one or more entries that specify authentication technologies to be used by applications. 

configJAAS

App/Secret Key Credential: 

API Keys are used to controlling the access of a datasource. Each API Key consists of a key and a secret. 

APKey



Copyright © 2023, OvalEdge LLC, Peachtree Corners GA USA.