Knowledge Base

Connectors

RDBMS

EventHub

EventHub is one type of message broker available on Azure. It is useful for ingesting high volumes of messages reliably. It provides support for crawling streams and profiling of messages.

EventHub

Connector Capabilities

Technical Specification

A technical specification for the Amazon Redshift connector contains information about the Crawler, Profiler, and Query Execution and also information about supported objects, supported data types, and user permissions.

Crawler

	Supported Objects Description
Tables	Topics in EventHub will be represented as tables
Table columns	Messages in EventHub will be represented as Columns, and they will be crawled while Sample profiling.

Profiling

	Supported Objects Description
Table Profiling	Row count, Columns count, View sample data
Views Profiling	Not Supported
Column Profiling	Min, Max, Null count, distinct, top 50 values
Full Profiling	Not Supported
Sample Profiling	Supported

Note: Lineage and Query are not supported.

Connection Details

To use the EventHub connector, the details specified in the following section should be available.

Prerequisites

The APIs/ drivers used by the connector are given below:

Item List	Versions	Description
Driver(s)	-	No external driver Required
Internal jars	5.3.0	org.apache.kafka.clients
Internal jars	5.3.0	io.confluent.kafka-schema-registry-client
Service Account User	-	Service Account user with Read Privileges

User Permission

By default, the service account provided for the connector will be used for any query operations. If the service account with write privileges, then Insert/Update/Delete queries can be executed.
The minimum privileges required are:

Operation	Minimum Access Permission For Service Account
Connection Validation	SELECT, and USAGE
Crawling	Select, Usage, Reference, and Execution
Profiling	No permission is required to profile

Note: Navigate to Configuration > Users & Roles for roles and permission.

Add Connection

To connect to the EventHub using the OvalEdge application, complete the following steps.

Login to the OvalEdge application
Navigate to Administration > Connectors module.
To add a new connection, click on the +AddNewConnector icon. A manage connection pop-up is displayed to select a connector.
Click on the +AddNewConnector icon, and the Manage Connection with Search Connector pop-up window is displayed. Select the connection type as EventHub. The Manage Connection with EventHub specific details pop-up window is displayed.

manageconnection-1

5. The following are the field attributes required for the connection.

Field Name	Mandatory/Optional	Description
Connection Type	Mandatory	Select Amazon Redshift connector. By default, the selected connection type is displayed as Amazon Redshift. If required, the connection type can be changed, and depending on the connector selected, fields are displayed accordingly.
License Type	Mandatory	By default License type is Auto Lineage. License Type: In a license type, the permissions are specified based on the customer's requirements. The user has the option to select the license type as Standard or Auto Lineage. The connector license is categorized into (i) Standard: The standard connectors may not have Auto Lineage functionality. It will not build the lineage for the selected database. (ii) Auto Lineage: Additionally, auto lineage connectors have Auto Lineage functionality. It will build the lineage for the selected database. See, License Types for more information.
Connection Name	Mandatory	Select a Connection name for the EventHub database. The name that you specify is a reference name to identify your EventHub database connection in OvalEdge. Example: EventHub Connection1
Broken URL*	Mandatory	Database instance URL (on-premises/cloud-based) Example: oval-EventHub.csklygkwz3dx.us-east-1.rds.amazonaws.com
Cluster Name *	Mandatory	(Default)
Consumer Group	Optional	Provide the Consumer Group
JAAS Config Path*	Mandatory	Provide the path for EventHub secret key for validation
Registry URL	Optional	Enter the Registry URL
Directory ID (Tenant ID)	Optional	It is optional to provide the Directory Id
Application ID (Client ID)	Optional	It is optional to provide the Directory Id
Client Secret	Optional	It is optional to enter the Client Secret
Default Governance Roles	Mandatory	From the dropdown list, select Stewards, Custodian, and Owner.

6. Once after entering the connection details in the required fields,
7. Click on the Save button or Save & Configure to establish and configure the connection settings. When you click the Save & Configure button, the Connection Settings pop-up window is displayed, where you can configure the connection settings for the selected Connector.

Note: The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

8. click on the Validate button the entered connection details are validated.

Note: It is up to the user's choice, you can save the connections details first, or you can validate the connection and then save it.

Error Validation Details

The following are the possible error messages encountered during the validation.

Sl.No	Error Messages	Description
1	/volume/OE_DATA/Connections/EventHub/con-config(No Such File or Directory)	Incorrect JAAS path
2	Invalid credentials passed!	Incorrect password

Connection Settings

Once connectivity is established, additional configurations for crawling and profiling can be specified:

Crawler

Crawler has various settings tabs for crawling and profiling options. The crawler options are available for all the connections. Based on the connection selected, the options will differ. You need to provide the mandatory options for the crawler setting one of them is mandatory.

Crawler Options

Crawler options	Tables and Columns: This crawling will discover the tables and Columns and bring them into OvalEdge. This is the Default option for crawling.
Crawler Rule	In the Crawler Rules, when setting up the regex rules, the user will be able to write rules that will either include and/or exclude schema, tables, views, columns, procedures, and functions that start with, end with, or have middle characters as defined in the rule.

Profiler

Profiling a data source also helps identify relationships between the tables at the entity level and patterns between them. Many attributes can be specified in the profile settings. Once the setting for profiling has been configured, go back to the Crawler screen and click “Crawl/Profile” to begin the profiling.

Note: Profiling will be successful if “All” and “Current Day” are matched.

The attributes are as follows,

Parameters	Description
Order	Order number is a sequence in which the profiling is done.
Day	The day of a week in which profiling is set to run.
Start/End Time	The start and end time which profiling is set to perform.
Number of Threads	A thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries that are executed on the data source.
Profile Type	There are four main types of data profiling. Sample - For Sample profiling, The profiling is performed based on a given Sample Profile Size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different compared to full profiles as we calculate them only on sample size. To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled). Auto - For Auto profiling, If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (100) is less than the total Table Row Count (1000) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size. If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit. Note: A profile type set to “Auto” will always depend on the Rowcount limit when the row count constraint must be set as “True”. Query - For Query profiling, If the entered table row count is less than the Rowcount Limit, then the profiling is executed on the entire table. If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues. Disabled profile type prevents profiling on the selected data source
Row Count Constraint	The Row Count Constraint option is applicable only when the Profile Type is selected as Auto. If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (100) is less than the total Table Row Count (1000) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size. If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit.
Row Count Limit	Enter the maximum number of rows that should be considered for profiling.
Sample Data Count	Enter the total number of rows to see within the table data page in the Catalog.
Sample Profile Size	Select this option for considering the total number of rows in profiling.
Query Timeout	Enter the number of seconds for the query to run on the remote database before it times out.

Access Instruction

It allows the admin user to write the instructions and guide the business user to crawl the data source. Instructions are nothing, but some related information about the data source connection can be provided here. It could have the information resources like links, Images, or Videos that help business users of that particular data source connection.

For example: When an admin user saves the access instructions and crawls the schema, the saved instruction will appear in the data catalog after clicking on the schema. It's just providing info or guidelines on a specific schema.

You can provide the instruction in Connectors > Setting page
Click on the Access Instruction tab
Enter the instructions
Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like a database.

Other

When you navigate to the Others tab, the Send Metadata Changes Notifications to and Context URL sections are displayed.

Send Metadata Notification to:

Select whether the notifications for the Data Owner and Data Steward under the specific Roles need to be sent or not.
Select the desired role from the Roles dropdown menu and click on the Save Changes button. The notifications will be successfully sent to select Data Owner and Data Steward.

Context URL:

Enter the browser URL for the selected data source.

Example: https://azure.microsoft.com/en-in/services/event-hubs/#overview is entered in for the tableau connection.

Note: To add multiple URLs, click on the + Add New URL option. The new textbox to provide the URL details is displayed in the Context URL section.