Document DB Connector

The Document DB stores data in flexible, JSON-like documents, meaning fields that vary from document to document, and data structure that can be changed over time.

NOTE: You need an SSL certificate in your JAVA Environment to connect to Document DB. 

Please refer to the following document link before creating a connection in the OvalEdge.

https://docs.google.com/document/d/1nxgdbdyncCtrZFcbyGKzk2BoiWbBFUoaTvdexzz2xC8/

Ovaledge Uses Document DB (Mongo) Client API to make a connection to a running Document DB instance.

       

OvalEdge Crawling: It is a process of collecting information about data from various data sources like on-premise, cloud databases, Hadoop, visualization software, and file systems.

When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. Here, the crawler creates an index for every stored data element, which can later be used in data exploration within the OvalEdge Data catalog, which is a smart search. 

The OvalEdge crawlers can be scheduled to scan the databases regularly, so they always have an up-to-date index of the data element.

Data Sources: The Data Sources are the ones where the OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. 

This document provides information about how to make a connection to your Document DB instance and crawl the data from various workspaces.

Connect to the Data: Before crawling and building a connection, you must first connect to your data. OvalEdge requires users to configure a separate connection for each type of data source. The users must enter the source credentials and database information for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.

Connector Capabilities

The connectivity to the Document DB Connector is performed via the Document DB (Mongo) Client API. The connector currently supports the following versions of Driver/APIs:

The drivers used by the connector are given below:

Driver/API

Version

Details

Mongodb driver (DocumentDB)

3.12.5

https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.12.5

Note: Latest version is 3.12.8

sql-to-mongo-db-query-converter

1.11

https://mvnrepository.com/artifact/com.github.vincentrussell/sql-to-mongo-db-query-converter/1.1

Note: Latest version is 1.18

Technical Specifications

Crawling

Feature

Supported Objects

Remarks

Crawling

Tables

 

Table Columns

Supported Data Types:

String, Integer, Boolean Double, Timestamp, Object, Date, Object ID, Binary Data.


Profiling

Feature

Support

Remarks

Table Profiling

Row count, Columns count, View sample data

Supports all data types

Column Profiling

Min, Max, Null count, distinct, top 50 values

 

Full Profiling 

Supported

 

Sample Profiling 

Supported

 

Lineage Building

Feature

Remarks

Table Lineage

Not Supported

Column Lineage

Not Supported


Querying 

Operation 

Remarks

Select

Supported

Insert

Not supported, by default.

Update

Not supported, by default.

Delete

Not supported, by default.

Joins within Database

Not Supported

Joins outside Database

Not supported

Aggregations

Supported

Group By

Supported

Order By

Supported


Connection Details

Pre-requisites

To use the Amazon Document DB Connector, the details specified in the following section should be available.

  • An admin/service account for Crawling and Profiling. 
  • The minimum privileges required for a cluster user are

Operation 

Access Permission

Connection Validation

Read Any Database

Crawl Schemas

Read Any Database

Crawl Tables

Read Any Database

Profile Schemas, Tables

Read Any Database


To connect to the Amazon Document DB database using the OvalEdge application, complete the following steps.

  1. Login to the OvalEdge application
  2. In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
  3. Click on the Crawler sub-module name, and the Crawler Information page is displayed.
  4. In the Crawler Information page, click on the + icon. The Manage Connection with Search Connector pop-up window is displayed.
  5. In the Manage Connection pop-up window, select the connection type as Amazon Document DB. The Manage Connection with Amazon Document DB specific details pop-up window is displayed.

6. The following are the field attributes required for the connection of Amazon Document DB.

Property

Details

Connection Type

Amazon Document DB

License Type

Standard

Connection Name

Select a Connection name for the Amazon Document DB database. The name that you specify is a reference name to easily identify the Amazon Document DB database connection in OvalEdge.

Example: Amazon Document DB1

Cluster Endpoint

Document DB Cluster URL
Example:13.59.52.223:27017

Port

27017
Note: It might get changed.

Database

Admin
Note: It might get changed.

Username 

User account login credentials

Password 

User’s Password 

JAVA Home Path

Enter the Java Home Path 

Ex: C:\Program Files\Java\jdk1.8.0_333\

Or if there is no JDK this can be Java Home Path

C:\Program Files\Java\

Connection String

Enter the Cluster URL

Ex: 13.59.52.223:27017,18.223.24.104:27017,18.222.24.19:27017/?readPreference=secondaryPreferred&replicaSet=TestRS-0&authSource=admin

Plugin Server

Optional

Plugin Port

Optional

Default Governance Roles

Select the required governance roles for the Steward, Custodian, and Owner

No of Archive Objects

Enter the count for the archive objects.

7. Once after entering the connection details in the required fields, click on the validate button the entered connection details are validated the Save and Save & Configure buttons are enabled.

8. Click on the save button to establish the connection or the user can also directly click on the save and configuration button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

Crawler/Profiler Settings 

Once connectivity is established, additional configurations for crawling and profiling can be specified:

Settings

Property

Details

Order

Priority of the rule

Start Time and End Time

Used when crawling/profiling is to be scheduled

No. of Threads

No. of threads used to perform profiling

Profile Type

Disabled/Auto/Sample

Row Count Constraint

No. of rows to be fetched

Row Count Limit

The maximum limit of rows to be fetched

Sample Profile Size

Sample profile row limit

Sample Data Count

Sample count of the data 

Query Timeout

Time to wait for response

Crawler Options

Only Tables can be crawled

Crawler Rules

Only Table and Columns Include and Exclude Regex.

Note: In the Crawler Rules, we won't be using include and exclude regex functionalities for functions and procedures, and they are not present in Document DB.