Kafka is an open-source component that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.
OvalEdge uses an Apache Kafka Client Jar to connect to the data source, which allows you to crawl and profile the Topics and Messages.
Connector Capabilities
Functionality |
Supported Objects |
Crawling | Tables, Table Columns |
Profiling |
Table Profiling (Min, Max, Null Count, Distinct Count, Top 50 values) Column Profiling, Sample Profiling |
- Topics in Kafka will be represented as tables.
- Messages in Kafka will be represented as Columns and will be crawled while Sample profiling.
Prerequisite(s)
The following are the prerequisites to establish a connection to Apache kafka.
- Kafka Client Jar
- Service Account Permissions
- Configure environment variables (Optional)
Kafka Client Jar
OvalEdge uses Kafka Client Jar to connect to Apache kafka.
Jars |
Version |
Details |
Apache Kafka Jars |
5.5.1-css |
org.apache.kafka.clients |
Apache Kafka Jars |
3.2.0 |
org. apache.kafka |
Apache Kafka Jars |
5.5.1 |
io. confluent. Kafka-schema-registry-client |
Service Account with Minimum Read Permissions
The following are the minimum privileges required for a service account to crawl and profile the data.
Operation |
Minimum Access Permissions |
Connection Validation |
When a Schema registry URL is given, the Schame Registry user and password validation should be valid. If the Boostrap server and JAAS.Config file is valid, and the path of this JAAS.The config should be valid. |
Crawling |
Read access to the given cluster. |
Profiling |
Read access to the topic messages(Tables). |
Configure environment variables (Optional)
This section describes the settings or instructions that you should be aware of prior to establishing a connection. If your environments have been configured, skip this step.
Configure Environment Names
The Environment Names allow you to select the environment configured for the specific connector from the dropdown list in the Add Connector pop-up window.
You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments. Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance.
Steps to Configure the Environment
1. Navigate to Administration | Configuration
2. Select the Connector tab.
3. Find the Key name “connector.environment”.
4. Enter the desired environment values (PROD, STG) in the value column.
5. Click ✔ to save.
Establish a connection.
To connect to Apache kafka using the OvalEdge application, complete the following steps:
- Log in to the OvalEdge application
- Navigate to Administration > Connectors module.
- Click on the + icon, and the Add Connector with Search Connector pop-up window is displayed.
- Select the connection type as Apache kafka. The Add Connector with Apache kafka details pop-up window is displayed.
Fields |
Details |
Connection Type* |
The selected connection type Kafka is displayed by default. If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed. |
License Ad-Ons* |
All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
|
Connection Name* |
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Apache Kafka connection in the OvalEdge application. Example: Apache kafka Connection1 |
Broker URL |
To get Broker URL from the Kafka Server, go to the Section: Additional Information Database instance URL (on-premises/cloud-based) |
Cluster Name |
demo-kafka-cluster-12938. For more information, refer to the Section: Additional Information |
Cluster Authentication Typle |
It supports the following types of Authentication as explained in the section Additional Information > Configuring Cluster Authentication Typle. 1. JAAS Config path: The Java Authentication and Authorization Service (JAAS) login configuration file contains a username (APP Key) and password(Secret Key) for authentication. Note: You are provided with a JAAS Config file (plain authentication). You will open this file to edit usernames and passwords and save them to their local machine. Then this local path is noted to enter in the “JAAS Config Path” field parameter. 2. App/Secret Key Credential: Alternatively, you can avoid the JAAS Config Path and use the APP Key and Secret Key parameters. Note: Please refer to Section: Additional Information |
APP Key |
Enter the APP Key generated in Section APP Key and Secret Key 42pq6weov14sv4bwnoprop |
Secret Key |
Enter the Secret Key generated in Section APP Key and Secret Key wJalrXUtnFEMI/K7OyuUWUIG/bPYRfiCYEXAMPLEKEY |
KRB5 Config |
It is an optional parameter for Kerberos authentication, then you need to provide the Path URL. The krb5.conf file contains Kerberos configuration information, including the location of KDCs and Kerberos realms and the mapping of the Hostname into Kerberos realms. |
Security Protocol |
It is an encryption mechanism such as SASL_SSL. Enter the parameter as “SASL_SSL” |
SASL Mechanism |
Simple Authentication and Security Layer (SASL) for adding authentication support to connection-based protocols. Enter the parameter as “PLAIN”. |
Environment* |
The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG. The purpose of the environment field is to help you to understand that the new connector is established in an environment available at the Production, STG, and QA. Note: The steps to set up environment variables in explained in the prerequisite section. |
Default Governance Roles* |
You can select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. Note: The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section. |
Admin Roles |
Select the required admin roles for this connector.
|
Select Bridge |
With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes it easy to manage data movement between any source and destination. For more information, refer to Bridge Overview. For more information, refer to Bridge Overview |
6. Click on the Save button to save the connection. Alternatively, you can also directly click on the button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Note: * (asterisk) indicates the mandatory field required to create a connection. Once the connection is validated and saved, it will be displayed on the Connectors home page.
Note: You can either save the connection details first, or you can validate the connection first and then save it.
Connection Validation Errors
S.No. |
Error Message(s) |
Description |
1 |
Failed to Construct kafka consumer |
If anyone of the parameter from connector arguments is missing or given invalid, it will throw this error |
2 |
Unable to obtain the krb5 or keytab |
If it didn’t configure the keytab in VM arguments or the details which exist in the keytab are wrong, it will throw the message. |
Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.
Connector Settings
Once the connection is established successfully, various settings are provided to fetch and analyze the information from the data source.
The connection settings include Crawler, Profiler, Access Instruction, and Others.
Connection Settings |
Description |
---|---|
Crawler |
Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata. Check out the crawler options to set the crawler's behavior in the Crawler & Profiler Settings. |
Profiler |
Data profiling typically involves collecting statistics about data sources such as: |
Access Instruction |
Access Instruction allows the data owner to instruct others on using the objects in the application. |
Others |
The Send Metadata Changes Notifications option is used to set the change notification about the metadata changes of the data objects.
|
For more information, refer to the Connector Settings.
The Crawling of Schema(s)
You can use the Crawl/Profile option, which allows you to select the specific schemas for the following operations: crawl, profile, crawl & profile, or profile unprofiled. For any scheduled crawlers and profilers, the defined run date and time are displayed to set.- Navigate to the Connectors page, and click on the Crawl/Profile option.
- Select the required Schema(s).
- Click on the Run button that gathers all metadata from the connected source into OvalEdge Data Catalog.
Note: For more information on Scheduling, refer to Scheduling Connector
Additional Information
This section provides details about generating a few connection parameters and the available authentication details.
Cluster Name
A Kafka cluster is a system that consists of several Brokers, Topics, and Partitions for both. Connecting to any broker means connecting to the entire cluster.
Step(s):
- Log in to Apache Kafka.
- Click on the 'Environment', and
- Select the Cluster name (demo-Kafka-cluster-12933).
Broker URL (Boostrap Server)
In Apache Kafka, the server is referred to as a broker, and it is responsible for managing the storage and exchange of messages between producers and consumers. Each broker in the cluster stores a subset of the topic partitions.
Step(s):
- You can navigate to Cluster Overview and click on Cluster Settings.
- In the Endpoints section, select Bootstrap server URL information.
APP Key and Secret Key
You can navigate to the respective Cluster name, then click on API Keys.
You can select an existing API Key or generate a new service account API Key.
Steps to generate a new API key
- Navigate to Cluster Overview > API Keys.
- Click on + Add Key to generate a new API Key.
- Select Granular access as the scope for API Key to managing it as Service Account.
- Click on the Next button.
- Enter the New service account name and Description.
- Click on the Next button.
- Select appropriate permission at Cluster level and Topic level permission.
- Click on the Next button
- Copy the Key and Secret information.
Configuring Cluster Authentication Type
JAAS Config path: The Java Authentication and Authorization Service (JAAS) login configuration file contains one or more entries that specify authentication technologies to be used by applications.
App/Secret Key Credential:
API Keys are used to controlling the access of a datasource. Each API Key consists of a key and a secret.
Copyright © 2023, OvalEdge LLC, Peachtree Corners GA USA.