The Document DB stores data in flexible, JSON-like documents, meaning fields that vary from document to document, and data structure that can be changed over time.
NOTE: You need an SSL certificate in your JAVA Environment to connect to Document DB.
Please refer to the following document link before creating a connection in the OvalEdge.
https://docs.google.com/document/d/1nxgdbdyncCtrZFcbyGKzk2BoiWbBFUoaTvdexzz2xC8/
Ovaledge Uses Document DB (Mongo) Client API to make a connection to a running Document DB instance.
OvalEdge Crawling: It is a process of collecting information about data from various data sources like on-premise, cloud databases, Hadoop, visualization software, and file systems.
When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. Here, the crawler creates an index for every stored data element, which can later be used in data exploration within the OvalEdge Data catalog, which is a smart search.
The OvalEdge crawlers can be scheduled to scan the databases regularly, so they always have an up-to-date index of the data element.
Data Sources: The Data Sources are the ones where the OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog.
This document provides information about how to make a connection to your Document DB instance and crawl the data from various workspaces.
Connect to the Data: Before crawling and building a connection, you must first connect to your data. OvalEdge requires users to configure a separate connection for each type of data source. The users must enter the source credentials and database information for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.
Connector Capabilities
The connectivity to the Document DB Connector is performed via the Document DB (Mongo) Client API. The connector currently supports the following versions of Driver/APIs:
The drivers used by the connector are given below:
Driver/API |
Version |
Details |
Mongodb driver (DocumentDB) |
3.12.5 |
https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.12.5 Note: Latest version is 3.12.8 |
sql-to-mongo-db-query-converter |
1.11 |
https://mvnrepository.com/artifact/com.github.vincentrussell/sql-to-mongo-db-query-converter/1.1 Note: Latest version is 1.18 |
Technical Specifications
Crawling
Feature |
Supported Objects |
Remarks |
Crawling |
Tables |
|
Table Columns |
Supported Data Types: String, Integer, Boolean Double, Timestamp, Object, Date, Object ID, Binary Data. |
Profiling
Feature |
Support |
Remarks |
Table Profiling |
Row count, Columns count, View sample data |
Supports all data types |
Column Profiling |
Min, Max, Null count, distinct, top 50 values |
|
Full Profiling |
Supported |
|
Sample Profiling |
Supported |
Lineage Building
Feature |
Remarks |
Table Lineage |
Not Supported |
Column Lineage |
Not Supported |
Querying
Operation |
Remarks |
Select |
Supported |
Insert |
Not supported, by default. |
Update |
Not supported, by default. |
Delete |
Not supported, by default. |
Joins within Database |
Not Supported |
Joins outside Database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
Connection Details
Pre-requisites
To use the Amazon Document DB Connector, the details specified in the following section should be available.
- An admin/service account for Crawling and Profiling.
- The minimum privileges required for a cluster user are
Operation |
Access Permission |
Connection Validation |
Read Any Database |
Crawl Schemas |
Read Any Database |
Crawl Tables |
Read Any Database |
Profile Schemas, Tables |
Read Any Database |
To connect to the Amazon Document DB database using the OvalEdge application, complete the following steps.
- Login to the OvalEdge application
- In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
- Click on the Crawler sub-module name, and the Crawler Information page is displayed.
- In the Crawler Information page, click on the + icon. The Manage Connection with Search Connector pop-up window is displayed.
- In the Manage Connection pop-up window, select the connection type as Amazon Document DB. The Manage Connection with Amazon Document DB specific details pop-up window is displayed.
6. The following are the field attributes required for the connection of Amazon Document DB.
Property |
Details |
Connection Type |
Amazon Document DB |
License Type |
Standard |
Connection Name |
Select a Connection name for the Amazon Document DB database. The name that you specify is a reference name to easily identify the Amazon Document DB database connection in OvalEdge. Example: Amazon Document DB1 |
Cluster Endpoint |
Document DB Cluster URL |
Port |
27017 |
Database |
Admin |
Username |
User account login credentials |
Password |
User’s Password |
JAVA Home Path |
Enter the Java Home Path Ex: C:\Program Files\Java\jdk1.8.0_333\ Or if there is no JDK this can be Java Home Path C:\Program Files\Java\ |
Connection String |
Enter the Cluster URL Ex: 13.59.52.223:27017,18.223.24.104:27017,18.222.24.19:27017/?readPreference=secondaryPreferred&replicaSet=TestRS-0&authSource=admin |
Plugin Server |
Optional |
Plugin Port |
Optional |
Default Governance Roles |
Select the required governance roles for the Steward, Custodian, and Owner |
No of Archive Objects |
Enter the count for the archive objects. |
7. Once after entering the connection details in the required fields, click on the validate button the entered connection details are validated the Save and Save & Configure buttons are enabled.
8. Click on the save button to establish the connection or the user can also directly click on the save and configuration button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Crawler/Profiler Settings
Once connectivity is established, additional configurations for crawling and profiling can be specified:
Settings
Property |
Details |
Order |
Priority of the rule |
Start Time and End Time |
Used when crawling/profiling is to be scheduled |
No. of Threads |
No. of threads used to perform profiling |
Profile Type |
Disabled/Auto/Sample |
Row Count Constraint |
No. of rows to be fetched |
Row Count Limit |
The maximum limit of rows to be fetched |
Sample Profile Size |
Sample profile row limit |
Sample Data Count |
Sample count of the data |
Query Timeout |
Time to wait for response |
Crawler Options |
Only Tables can be crawled |
Crawler Rules |
Only Table and Columns Include and Exclude Regex. Note: In the Crawler Rules, we won't be using include and exclude regex functionalities for functions and procedures, and they are not present in Document DB. |