ArangoDB Connector allows users to crawl and profile the datasets like Tables, Columns, Views, Procedures, and Synonyms and helps build the data catalog and lineage.
Crawling: Crawling is a process of collecting information about data from various data sources like on-premise and cloud databases, Hadoop, visualization software, and file systems. When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. OvalEdge crawlers can be scheduled to scan the databases regularly.
Data Sources: OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. In this document, you can see how to make a connection to your Azure SQL Manager Instance and crawl the Views, Stored Procedures, and Functions.
Connect to the Data: Before you can crawl and build a data catalog, you must first connect to your data. OvalEdge requires users to configure a separate connection for each data source type. The users must enter the source credentials for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.
Prerequisites
The following are prerequisites for connecting to the ArangoDB Connector.
The APIs/ drivers used by the connector are given below:
Sl.No |
Driver / API |
Version |
Details |
---|---|---|---|
1 |
ArangoDB Java Drivers |
6.13.0 |
Download - https://mvnrepository.com/artifact/com.arangodb/arangodb-java-driver/6.13.0 Note: Latest version is 6.14.0. |
Community |
3.x |
Supported |
|
Enterprise |
3.x |
Supported |
User Permission
An admin/service account for crawling and building lineage. The minimum privileges required are:
Operation |
Access Permission |
---|---|
Connection validate |
SELECT.USAGE |
Crawling |
Select, Usage, Reference, and execution |
Profiling |
No permission |
Technical Specification
The connector capabilities are shown below:
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Databases |
- |
Collections |
- |
|
Collection Attributes |
All data types in ArangoDB |
|
Graphs |
- |
|
Views |
- |
|
Edge Collections |
Profiling
Feature |
Supported Objects |
Remarks |
---|---|---|
Collections Profiling |
Row Count, Columns Count, View Sample data |
- |
Collections Attributes Profiling |
Min, Max, Null Count, Distinct Count, Top 50 values |
|
Full Profiling |
Supported |
- |
Sample Profiling |
Supported |
- |
Lineage Building
Lineage entities |
Details |
---|---|
Collection Lineage |
Supported |
Collection Attributes Lineage |
Supported |
Lineage Sources |
Graphs, Views, Edge Collections |
Connection Details
To connect to the ArangoDB connector using the OvalEdge application, complete the following steps.
- Log into the OvalEdge application. Click on the Administration > Connectors module. The Connectors Information page is displayed.
- To add a new connection, click on the +AddNewConnector icon. A manage connection pop-up is displayed to select a connector. Select ArangoDB connector.
- The Manage Connection with ArangoDB connector-specific details pop-up window is displayed.
- Specify the following fields parameter to configure a connector.
Field Name |
Mandatory/Optional |
Description |
---|---|---|
Connection Type |
- |
Select ArangoDB connector. |
License Type |
Mandatory |
In a license type, the permissions are specified based on the customer's requirements. The user can select the license type as Standard or Auto Lineage. The connector license is categorized into (i) Standard: The standard connectors have a Crawler Profiler feature and may not have Auto Lineage functionality. It will not build the lineage for the selected database. (ii) Auto Lineage: Auto lineage connectors have Auto Lineage functionality in addition to the Crawling and Profiling feature. It will build the lineage for the selected database. See, License Types for more information. By default, the License type is displayed as ‘Auto Lineage’. |
Connection Name |
Mandatory |
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the ArangoDB connector in the OvalEdge application. |
Environment |
The environment aims to help users understand that the new connection can be established to that configured environment (development, production, and QA). From the dropdown, select the desired environment such as Production, QA, and Prod. |
|
Server |
Mandatory |
This is the hostname of the ArangoDB server (TCP/IP server). |
Port |
Mandatory |
It is a TCP/IP port to use if the server is not a local host. By default, port number 8529 related to the MySQL database is available. If needed, a new port number can be provided. |
Username |
Mandatory |
This credential will authenticate the access of the database. By default, the username to authenticate the database is displayed. If required, users can manually enter the user credential to access the required database of MySQL connector. |
Password |
The password is mandatory for users other than Root users. |
|
Plug-in Server |
Enter the server name if you are running this as a plugin. |
|
Plug-in Port |
Enter the Port number on which the plugin is running. |
|
Default Governance Roles |
The admin will select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned to the data asset. The dropdown list displays all the configured users (single user or a team) as per the configurations made in the Security > Governance Roles page. |
|
No of Archive objects |
It is the count of the number of last modifications made in the metadata data of a Remote/source. By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive. For example, if you update the count as 4 in the ‘No. of archive object’ field, and then the connection is crawled. It will provide the last 4 changes that occurred in the remote/source of the connector. You can observe these changes in the ‘version’ column of the ‘Metadata Changes’ module. |
|
Select Bridge |
The bridge enables users to import schemas from physical databases using JDBC. The client workstation that runs the bridge must be configured with the correct parameter values. The Bridge ID will be shown in the Bridge dropdown menu when bridges are configured and added, or it will be displayed as "NO BRIDGE". |
Connection Settings
Crawler Settings
Crawler Configurations Settings |
Details |
---|---|
Tables, Views, and Columns |
Select the checkbox to crawl the tables, views, and columns existing in the ArangoDB Connector into the OvalEdge. Note: By default, the checkbox for Tables, Views, and Columns is selected. |
Procedures, Functions, Triggers, Views Source Codes |
|
Crawler Rules: Include Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling. |
Crawler Rules: Exclude Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling. |
Profiler Settings
Profiler Configurations Settings |
Details |
Tables and Columns |
Select the checkbox to profile the tables and columns that exist in the ArangoDB Connector into the OvalEdge. Note: By default, the checkbox for Tables and Columns is selected. |
Views and Columns |
Select the checkbox to profile the views and columns that exist in the ArangoDB Connector in the OvalEdge. Note: By default, the checkbox for Views and Columns is selected. |
To configure the Profile Setting,
Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.
The attributes are as follows:
columns |
Description |
Order |
Order is the sequence in which the profiling is done. |
Day |
Enter the day of the week profiling is set to run. |
Start/End Time |
Enter the start and end time at which profiling is set to perform. |
Number of Threads |
Thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source. |
Profile Type |
There are four main types of data profiling.
|
Row Count Constraint |
If set to true, it enables the data rule profiling. |
Row Count Limit |
Enter the number of rows of data to be profiled. |
Sample Profile Size |
Enter the total number of rows to be included in profiling. |
Query Timeout |
Enter the length of time in seconds to allow the query to run on a remote database before timing out. |
Access Instruction
It allows the Crawler admin to write the instructions and guide the user to crawl the data source.
- You can provide the instruction in Crawler > Setting page
- Click the Access Instruction tab
- Enter the instructions
- Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like a database.
Copyright © 2025, OvalEdge LLC, Peachtree Corners GA USA