RDBMS

Elastic Search On-Prem Connector

Elasticsearch is an open-source, distributed search and analytics engine designed for horizontal scalability, meaning it can easily handle and process large amounts of data across multiple nodes or servers. It is part of the Elastic Stack, which also includes Logstash for data collection and processing, and Kibana for data visualization and exploration.

Elasticsearch is built on top of Apache Lucene, a full-text search engine library. It is commonly used for real-time search scenarios, where organizations need to quickly retrieve and analyze data from large volumes of unstructured or semi-structured data. Elasticsearch is often employed in various applications, such as log and event data analysis, text and document search, and business intelligence.

OvalEdge on the other hand, offers a user-friendly interface that enables connectivity to Elasticsearch. It integrates with Elasticsearch using Rest APIs and crawls the mapping templates.

Connector Capabilities

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Crawling

Index Mapping Template

Profiling

 

Feature

Supported Objects

Remarks

Table Profiling

Row Count, Columns Count, View Sample data

-

View Profiling

Row Count, Columns Count, View sample data

View is treated as a table for profiling purposes.

Column Profiling

Min, Max, Null Count, Distinct, Top 50 values

-

Full Profiling

Supported

-

Sample Profiling

Supported

-

Prerequisites

The following are prerequisites for connecting to the Elasticsearch On-Prem:

Drivers

The APIs/drivers used by the connector are given below:

S.No.

Driver / API

1

Rest API

Service Account Permissions

An admin/service account for crawling and building lineage. The minimum privileges required are:

Operation

Access Permission

Connection validate

Read

Crawl objects

Read

Profile indexes

Read

Configuring Environment Variables

Configuring environment names enables you to select the appropriate environment from the drop-down list when adding a connector. This allows for consistent crawling of schemas across different environments, such as production (PROD), staging (STG), or temporary environments. It also facilitates schema comparisons and assists in application upgrades by providing a temporary environment that can be later deleted if needed.

Before establishing a connection, it is important to configure the environment names for the specific connector. If your environments have been configured, skip this step. 

Steps to Configure the Environment

  1. Log into the OvalEdge application.
  2. Navigate to AdministrationSystem Settings.
  3. Select the Connector tab.
  4. Find the key name “connector.environment”.
  5. Enter the desired environment values (PROD, STG) in the Value column.
  6. Click ✔ to Save.

Establish a Connection

To connect to the Elasticsearch On-Prem using the OvalEdge application, complete the following steps:

1. Log in to the OvalEdge application.
2. Navigate to Administration >  Connectors.
3. Click on the + (New Connector) icon, and the Add Connection with Search Connector pop-up window is displayed.

4. Add Connector pop-up window is displayed where you can search for the Elasticsearch On-Prem connector.

5. The Add Connector with Connector Type specific details pop-up window is displayed. Enter the relevant information to configure the Elasticsearch On-Prem connection.

Note: The asterisk (*) denotes mandatory fields required for establishing a connection.

          

Field Name

Description

Connector Type

By default, 'Elastic Search On Premise' is displayed as the selected connector type. If you want you can select the connector from the drop-down list.

Credential Manager

Select the option from the drop-down list, where you want to save your credentials.

OE Credential Manager: The connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Elasticsearch On-Prem. 

HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.  

AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.

AzureKeyVault: The credentials are stored in the Azure Key Vault database server and fetched from the Azure Key Vault to OvalEdge. Click here to know more.


For more information on Credential Manager, refer to Credential Manager.

License Add Ons

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a data source. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

  • Auto Lineage: Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature.
  • Data Quality: Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
  • Data Access: Select the Data Access Add-On license that will enforce connector access via OvalEdge with the Remote Data Access Management (RDAM) feature enabled.

Connector Name*

Provide a connector name for the Elasticsearch On-Prem data source in OvalEdge. This name will serve as a reference to identify the specific Elasticsearch On-Prem database connection.

 Example: "ESOnPrem_Connection_test"

Connector Environment

The Connector Environment drop-down list allows you to select the environment configured for the connector from the drop-down list. 

For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).

The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).

 Note: The steps to set up environment variables are explained in the Configuring Environment Variables section.

Base URL*

Specify the name of the database instance server IP/URL, which is accessible by the OvalEdge application.

Format: http://{ip address}:{port}

Username*

Enter the Service Account Username of the Elasticsearch On-Prem.

Password*

Enter the password of the Elasticsearch On-Prem data source.

Steward*

Select the Steward from the drop-down list options.

Custodian*

Select the Custodian from the drop-down list options.

Owner*

Select the Owner from the drop-down list options.

Governance Roles 4, 5, 6*

Select the respective user from the drop-down options.


Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security > Governance Roles section.

Integration Admins*

To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.

Security and Governance Admins*

To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
The security and Governance Admin is responsible for:

  • Configure role permissions for the connector and its associated data objects.
  • Add admins to set permissions for roles on the connector and its associated data objects.
  • Update governance roles.
  • Create custom fields.
  • Develop Service Request templates for the connector.
  • Create Approval workflows for the templates.

No. of Archived Objects*

The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive. 

Select Bridge

With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes it easy to manage data movement between any source and destination. For more information, refer to Bridge Overview.

For more information, refer to Bridge Overview

6. After filling in all the connection details, select the appropriate button based on your preferences. 

    1. Validate: Click on the Validate button to verify the connection details. This ensures that the provided information is accurate and enables successful connection establishment.
    2. Save: Click on the Save button to store the connection details. Once saved, the connection will be added to the Connectors home page for easy access.
    3. Save & Configure: For certain Connectors that require additional configuration settings. Click on the Save & Configure button. This will open the Connection Settings pop-up window, allowing you to configure the necessary settings before saving the connection.

7. Once the connection is validated and saved, it will be displayed on the Connectors home page.

 Note: You can either save the connection details first, or you can validate the connection first and then save it.

Connection Validation Details

S.No

Error Message(s)

Description

1

Failed to establish a connection, please check the credentials.

In case of an invalid username and password.

 Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Connector Settings

Once the connection is established successfully, various settings are provided to fetch and analyze the information from the data source.  The connection settings include Crawler, Profiler, Access Instruction, Business Glossary Settings, and Notification.

To view the Connector Settings page,

  1. Go to the Connectors page.
  2. From the nine dots select the Settings option.
  3. The Connector Settings page is displayed where you can view all the connector setting options.
  4. Click on Save Changes. All the settings will be applied to the metadata.

Connection Settings

Description

Crawler

Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata. 

Profiler




The process of gathering statistics and informative summaries about the connected data source(s). Statistics can help assess the quality of data sources before using them for analysis. Profiling is always optional; crawling can be run without profiling. 

Access Instruction

Access Instruction allows the data owner to instruct others on using the objects in the application.

Business Glossary Settings

The Business Glossary Setting provides flexibility and control over how they view and manage term association within the context of a business glossary at the connector level.

Notification

The Enable/Disable Metadata Change Notifications option is used to set the change notification about the metadata changes of the data objects.

  • You can use the toggle button to set the Default Governance Roles (Steward, Owner Custodian, etc.) 
  • Using the Roles and Teams, you can select the role and team to receive the notification of metadata changes.

Note: For more information on Connector settings, check the Connector Settings article.

Crawling of Schema(s)

A Crawl/Profile option allows you to select the specific schemas for the following operations:       Crawl, Crawl & Profile, Profile, or Profile Unprofiled. For any scheduled crawlers and profilers, the defined run date and time are displayed to set.

1. Navigate to the Connectors page, and click on the Crawl/Profile button.
Select Schema For Crawling and Profiling pop-up window is displayed.

Note: By default the schema name is shown as Default schema for every Elasticsearch on-prem connector.

2. The below list of actions is displayed.
    1. Crawl: It allows the crawling of the metadata of the selected schemas.
    2. Crawl & Profile: It allows crawling the metadata of the selected schemas and profiles the sample data.
    3. Profile: It allows the collection of table column statistics.
    4. Profile Unprofiled: It allows the profiling of data that has not been profiled.

Schedule: Connectors can also be scheduled for crawling and/or profiling in advance to run at prescribed times and selected intervals.
Note: For more information on Scheduling, refer to Scheduling Connector.