Knowledge Base

Connectors

RDBMS

Document DB Connector

The Document DB stores data in flexible, JSON-like documents, meaning fields that vary from document to document, and data structure that can be changed over time.

NOTE: You need an SSL certificate in your JAVA Environment to connect to Document DB.

Please refer to the following document link before creating a connection in the OvalEdge.

https://docs.google.com/document/d/1nxgdbdyncCtrZFcbyGKzk2BoiWbBFUoaTvdexzz2xC8/

Ovaledge Uses Document DB (Mongo) Client API to make a connection to a running Document DB instance.

OvalEdge Crawling: It is a process of collecting information about data from various data sources like on-premise, cloud databases, Hadoop, visualization software, and file systems.

When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. Here, the crawler creates an index for every stored data element, which can later be used in data exploration within the OvalEdge Data catalog, which is a smart search.

The OvalEdge crawlers can be scheduled to scan the databases regularly, so they always have an up-to-date index of the data element.

Data Sources: The Data Sources are the ones where the OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog.

This document provides information about how to make a connection to your Document DB instance and crawl the data from various workspaces.

Connect to the Data: Before crawling and building a connection, you must first connect to your data. OvalEdge requires users to configure a separate connection for each type of data source. The users must enter the source credentials and database information for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.

Connector Capabilities

The connectivity to the Document DB Connector is performed via the Document DB (Mongo) Client API. The connector currently supports the following versions of Driver/APIs:

The drivers used by the connector are given below:

Driver/API

Version

Details

Mongodb driver (DocumentDB)

3.12.5

https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.12.5

Note: Latest version is 3.12.8

sql-to-mongo-db-query-converter

1.11

https://mvnrepository.com/artifact/com.github.vincentrussell/sql-to-mongo-db-query-converter/1.1

Note: Latest version is 1.18

Technical Specifications

Crawling

Feature

Supported Objects

Remarks

Crawling

Tables

Table Columns

Supported Data Types:

String, Integer, Boolean Double, Timestamp, Object, Date, Object ID, Binary Data.

Profiling

Feature	Support	Remarks
Table Profiling	Row count, Columns count, View sample data	Supports all data types
Column Profiling	Min, Max, Null count, distinct, top 50 values
Full Profiling	Supported
Sample Profiling	Supported

Lineage Building

Feature	Remarks
Table Lineage	Not Supported
Column Lineage	Not Supported

Querying

Operation	Remarks
Select	Supported
Insert	Not supported, by default.
Update	Not supported, by default.
Delete	Not supported, by default.
Joins within Database	Not Supported
Joins outside Database	Not supported
Aggregations	Supported
Group By	Supported
Order By	Supported

Connection Details

Pre-requisites

To use the Amazon Document DB Connector, the details specified in the following section should be available.

An admin/service account for Crawling and Profiling.
The minimum privileges required for a cluster user are

Operation	Access Permission
Connection Validation	Read Any Database
Crawl Schemas	Read Any Database
Crawl Tables	Read Any Database
Profile Schemas, Tables	Read Any Database

To connect to the Amazon Document DB database using the OvalEdge application, complete the following steps.

Login to the OvalEdge application
In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
Click on the Crawler sub-module name, and the Crawler Information page is displayed.
In the Crawler Information page, click on the + icon. The Manage Connection with Search Connector pop-up window is displayed.
In the Manage Connection pop-up window, select the connection type as Amazon Document DB. The Manage Connection with Amazon Document DB specific details pop-up window is displayed.

6. The following are the field attributes required for the connection of Amazon Document DB.

Property	Details
Connection Type	Amazon Document DB
License Type	Standard
Connection Name	Select a Connection name for the Amazon Document DB database. The name that you specify is a reference name to easily identify the Amazon Document DB database connection in OvalEdge. Example: Amazon Document DB1
Cluster Endpoint	Document DB Cluster URL Example:13.59.52.223:27017
Port	27017 Note: It might get changed.
Database	Admin Note: It might get changed.
Username	User account login credentials
Password	User’s Password
JAVA Home Path	Enter the Java Home Path Ex: C:\Program Files\Java\jdk1.8.0_333\ Or if there is no JDK this can be Java Home Path C:\Program Files\Java\
Connection String	Enter the Cluster URL Ex: 13.59.52.223:27017,18.223.24.104:27017,18.222.24.19:27017/?readPreference=secondaryPreferred&replicaSet=TestRS-0&authSource=admin
Plugin Server	Optional
Plugin Port	Optional
Default Governance Roles	Select the required governance roles for the Steward, Custodian, and Owner
No of Archive Objects	Enter the count for the archive objects.

7. Once after entering the connection details in the required fields, click on the validate button the entered connection details are validated the Save and Save & Configure buttons are enabled.

8. Click on the save button to establish the connection or the user can also directly click on the save and configuration button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

Crawler/Profiler Settings

Once connectivity is established, additional configurations for crawling and profiling can be specified:

Settings

Property	Details
Order	Priority of the rule
Start Time and End Time	Used when crawling/profiling is to be scheduled
No. of Threads	No. of threads used to perform profiling
Profile Type	Disabled/Auto/Sample
Row Count Constraint	No. of rows to be fetched
Row Count Limit	The maximum limit of rows to be fetched
Sample Profile Size	Sample profile row limit
Sample Data Count	Sample count of the data
Query Timeout	Time to wait for response
Crawler Options	Only Tables can be crawled
Crawler Rules	Only Table and Columns Include and Exclude Regex. Note: In the Crawler Rules, we won't be using include and exclude regex functionalities for functions and procedures, and they are not present in Document DB.