Connectors

Understanding Connector Setup

In this article, we'll review the prerequisites and permissions required for building a data catalog in OvalEdge through crawling and profiling data connectors. 

Introduction

As we covered in the Overview of Connectors article, for OvalEdge to connect to a data source, there needs to be an active service user account created at the data source with at least read-only permissions.

You can read a more in-depth explanation of these roles in the dedicated How to Build Data Literacy article. 

Procedures, Functions, Triggers and Views Source Code - provides information on specific code, this code is further used to build lineage and ER diagrams

Note: A profile type set to Auto will always depend on the Rowcount limit.

Select the Other tab to set notifications. By selecting Yes or No, you can send Data Owners or Data Stewards alerts when metadata changes are made. Or assign specific roles to notify about these changes. 

Managing connectors 

Once you have a user account with the correct permissions in place, you must connect with the data source. Anyone with the correct permissions can choose from a series of logos and icons representing existing out-of-the-box connectors.

When you open a connector from the menu, there are various options. The specific details for each connector are available in the corresponding Connector article, but there are multiple parameters to choose from for each one.

Beyond this, you can manage the Default Governance Roles, which include:

  • Steward - manages data quality and other critical processes
  • Custodian - manages data access and storage
  • Owner - responsible for data in a specific domain

You can read a more in-depth explanation of these roles in the dedicated How to Build Data Literacy article. 

What is OvalEdge Bridge?

Because OvalEdge doesn't have direct access to a company's database framework when deployed as a SaaS application, a solution is required to circumnavigate the customer firewall. That solution is OvalEdge Bridge.

Every firewall enables entry via HTTPS, so the Bridge utilizes this gateway to gain access to customer applications. OvalEdge installs the Bridge within a company's server, which then connects with the relevant database and creates an HTTPS port. It's through this port, installed as a plugin on the customer's machine and machines at OvalEdge HQ, that OvalEdge can communicate remotely with a customer database.

When OvalEdge runs a customer job, the Bridge will collect the metadata from the database. It's possible to run up to 10 bridges consecutively on different databases. 

Managing Settings

You can manage the crawler, profiler, and other processes from the Settings page. 

Crawler settings

In Crawler Options, users can choose what to crawl in each connector. There are three options:

  • Tables, Views, and Columns - collects proper metadata
  • Procedures, Functions, Triggers & Views Source Code - provides information on specific code, this code is further used to build lineage and ER diagrams 
  • Relationships - helps users to create the ER diagram, this crawls the primary key and foreign key from the connector database and imports the ER diagram. 

Defining Crawler Rules enables users to filter the crawl through Regular Expression (Regex)—a specific programming language—commands. These commands can define schema, table and views, columns, procedures and functions to include and exclude.

There are two ways to crawl data. One is through procedures, and the other is via the query log. Queries help you to build data lineage and relationships. 

Users can query directly from databases in the associated programming language with Query Log Settings. You can select the query type and the lookback period, but the query text is out of the box, depending on the connector. However, you can change the query for your own custom requirements. You can also specify the schema you want to retrieve the Query from and exclude specific users who may already be running the same Query. 

Profiler settings

In Profile Options, users can profile Tables and Columns of Views and Columns. Again, you can use Regex to choose what you do and what you don't want to profile. 

In Profile Settings, users can set their base scheme. This enables you to select specific times and other parameters for jobs. Order refers to where the scheme is run, always after the pre-set scheme.

Day refers to the specific day that the scheme is run. Users can also set a Start Time and End Time and specify the Number of Threads—the number of threads OvalEdge will run to profile the data.

This flexibility enables users to use more threads to profile busy databases, such as transactional databases, at times like the weekend, when fewer users will access its processing power. 

Users can also select the Profile Type: Query, sample, or auto. Query profiles run SQL to underline the database and rely on source system processing power. Sample profiling takes a sample set and profiles it in OvalEdge, which doesn't use source system processing power.

Auto profiling requires users to set a Row Count Limit, which will automatically revert to a fixed Sample Profiling Size when it exceeds this limit. The Sample Profiling Count is the number of data assets visible in OvalEdge, and Query Timeout is a limit you can set for the amount of time it takes to run a query. 

Example: 

  • If the Row Constraint checkbox is selected (Set as True) and if the Table Row Count (1000) is more significant than configured Rowcount Limit (100) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
  • If the Row Constraint checkbox is selected (Set as True) and if the Table Row Count (100) is less than configured Rowcount Limit (1000) then all the rows of the table will be profiled without considering the count mentioned in the Rowcount Limit.

Note: A profile type set to “Auto” will always depend on the Rowcount limit 

Query Policies

When querying data, there are two options. You can query the data directly or go through OvalEdge.  Query Policies are used when you query through OvalEdge. However, when you directly query in the database, these policies do not apply. 

You can select which roles have access, the Query Type, and whether to DENY access. The Query Policies are default set to ALLOW; you can configure this to your liking. For example, you may prohibit OE_Public from running heavy queries to limit the processing power used. 

Access Instructions 

Access Instructions enable you to write instructions that appear whenever a user attempts to access specific data from the remote system. For example, explain how a user can connect to the data source directly. 

Other

Select the Other tab to set notifications. By selecting Yes or No, you can send Data Owners or Data Stewards alerts when metadata changes are changed. Or assign specific roles to notify. 

Deleting connectors

When you delete a connector or schema, all objects and augmented human knowledge associated with it are also deleted. For this reason, you will receive a series of warning messages before you complete the deletion process. 

Additional resources

OvalEdge Academy

Connector Setup and Configuration:


Copyright © 2023, OvalEdge LLC, Peachtree Corners GA USA