Knowledge Base

Connectors

RDBMS

AWS Dynamo DB Connector

An out-of-the-box connector is available for the AWS Dynamo DB Connector. It is used to pull the metadata existing in the AWS Dynamo DB Connector database to crawl the metadata and profile the sample data and build lineage to view the movement of the crawled data displaying the relationship between the objects and its profile statistics

Crawling: Crawling is a process of collecting information about data from various data sources like on-premise and cloud databases, Hadoop, visualization software, and file systems. When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. OvalEdge crawlers can be scheduled to scan the databases regularly

Data Sources: OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. In this document, you can see how to make a connection to your Azure SQL Manager Instance and crawl the Views, Stored Procedures, and Functions.

Connect to the Data: Before you can crawl and build a data catalog, you must first connect to your data. OvalEdge requires users to configure a separate connection for each data source type. The users must enter the source credentials for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.

Prerequisites

The following are prerequisites for connecting to the AWS Dynamo DB Connector.

The APIs/ drivers used by the connector are given below:

Driver / API	Version	Details
Drivers	AWS SDK for Dynamo DB	SDK given by AWS to communicate with DynamoDB

User Permission

An admin/service account for crawling and building lineage. The minimum privileges required are:

Operation	Access Permission
Connection validate	Read
Crawl datasets	Read

Technical Specification

The connector capabilities are shown below:

Crawling

Feature	Supported Objects	Remarks
Crawling	Tables	-
	Table Columns	Supported Data Types: All the standard data types of AWS DynamoDB.
	Views	-
	Stored Procedures	-
	Functions	-

Profiling

Feature	Supported Objects	Remarks
Table Profiling	Row Count, Columns Count, View Sample data	-
View Profiling	Row Count, Columns Count, View sample data	View is treated as a table for profiling purposes.
Column Profiling	Min, Max, Null Count, Distinct, Top 50 values	-
Full Profiling	Supported	-
Sample Profiling	Supported	-

Lineage Building

Lineage entities	Details
Table Lineage	Not Supported
Column Lineage	Not Supported
Lineage Sources	Not Supported

Querying

Operation	Details
Select	Not Supported
Insert	Not Supported, by default.
Update	Not Supported, by default.
Delete	Not Supported, by default.
Joins within database	Not Supported
Joins outside database	Not supported
Aggregations	Not supported
Group By	Not supported
Order By	Not supported

By default, the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert / Update / Delete queries can be executed.

Connection Details

To connect to the AWS Dynamo DB Connector using the OvalEdge application, complete the following steps.

log in to the OvalEdge application
Navigate to Administration > Connectors module.
click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed.
Select the connection type as AWS Dynamo DB Connector. The Manage Connection with AWS Dynamo DB Connector-specific details pop-up window is displayed.

Field Name	Mandatory/Optional	Description
Connection Type	-	Select AWS Dynamo DB Connector.
Authentication		If IAM User Authentication is selected Access Key and Secret Key will be displayed. If Role Based Authentication is selected Cr
License Type	Mandatory	You can choose the License Type.
Connection Name	Mandatory	Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the AWS Dynamo DB Connector in the OvalEdge application.
Environment		From the dropdown select the desired environment such as Production, QA, and Stage. Environment: The purpose of the environment field is to help users to understand that the new connector is established in an environment available at development, production, stage and qa.
IAM User Authentication
Access Key	Mandatory	It is an access key of an IAM user.
Secret Key	Mandatory	It is a secret key of an IAM user.
Role Based Authentication
Cross-Account Role ARN	-	Enter ARN of the Role.
Database Region	-	Region of AWS Dynamo DB Connector.
Filter by Tags	-	Tags of dynamo DB tables.
Governance Roles	Mandatory	From the dropdown list, select Stewards, Custodian and Owner.
Select Bridge	Optional	Select option NO Bridge if no bridge is available for the connector

Connection Settings

Crawler Settings

Crawler Configurations Settings	Details
Tables, Views, and Columns	Select the checkbox to crawl the tables, views, and columns existing in the Dynamo DB database into the OvalEdge. Note: By default, the checkbox for Tables, Views, and Columns is selected.
Crawler Rules: Include Regex	Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling.
Crawler Rules: Exclude Regex	Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling.

Crawler Configurations Settings

Details

Tables, Views, and Columns

Select the checkbox to crawl the tables, views, and columns existing in the Dynamo DB database into the OvalEdge.

Note: By default, the checkbox for Tables, Views, and Columns is selected.

Crawler Rules: Include Regex

Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling.

Crawler Rules: Exclude Regex

Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling.

Profiler Settings

Profiler Configurations Settings

Details

Tables and Columns

Select the checkbox to profile the tables and columns that exist in the Dynamo DB database into the OvalEdge.

Note: By default, the checkbox for Tables and Columns is selected.

Views and Columns

Select the checkbox to profile the views and columns that exist in the Dynamo DB database into the OvalEdge.

Note: By default, the checkbox for Views and Columns is selected.

To configure the Profile Setting,

Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.

The attributes are as follows:

Columns	Description
Order	Order is the sequence in which the profiling is done.
Day	Enter the day of the week profiling is set to run.
Start/End Time	Enter the start and end time at which profiling is set to perform.
Number of Threads	Thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source.
Profile Type	There are four main types of data profiling. Sample Profiling runs the profile on a given sample size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different compared to full profiles as we calculate them only on sample size. The sample profile is based on two main values. To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled).
Row Count Constraint	If set to true, it enables the data rule profiling.
Row Count Limit	Enter the number of rows of data to be profiled.
Sample Profile Size	Enter the total number of rows to be included in profiling.
Query Timeout	Enter the length of time in seconds to allow the query to run on a remote database before timing out.

Access Instruction

It allows the Crawler admin to write the instructions and guide the user to crawl the data source.

You can provide the instruction in Crawler > Setting page
Click the Access Instruction tab
Enter the instructions
Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like a database.