The Amazon Redshift connector is used to pull the metadata existing in the Amazon Redshift connector database and helps the users to crawl the metadata and profile the sample data and build lineage to view the movement of the crawled data displaying the relationship between the objects and its profile statistics.
Data Sources
OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. In this document, you can see how to make a connection to your Amazon Redshift and crawl the Views, Stored Procedures, and Functions.
Crawling
Crawling is a process of collecting information about data from various data sources like on-premise and cloud databases, Hadoop, visualization software, and file systems. When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in OvalEdge. OvalEdge crawlers can be scheduled to scan the databases regularly.
Connect to the Data
Before you can crawl and build a data catalog, you must first connect to your data. OvalEdge requires users to configure a separate connection for each data source type. The users must enter the source credentials for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.
Connector Capabilities
Technical Specification
A technical specification for the Amazon Redshift connector contains information about the Crawler, Profiler, and Query Execution and also information about supported objects, supported data types, and user permissions.
Crawling
Feature |
Supported Objects |
Supported Data Types |
---|---|---|
Crawling |
Tables |
- |
Table Columns |
Bigint, Boolean, Char, Date, Datetime, Datetime2, Time Stamp, Decimal (P, S), Float, Int, Real, Smallint, Time, Varchar |
|
Views, Stored Procedures, Functions |
- |
Profiling
Feature |
Supported Objects |
Remarks |
---|---|---|
Table Profiling |
Row Count, Columns Count, View Sample data |
- |
View Profiling |
Row Count, Columns Count, View sample data |
The view is treated as a table for profiling purposes. |
Column Profiling |
Min, Max, Null Count, Distinct, Top 50 values |
- |
Full Profiling |
Supported |
- |
Sample Profiling |
Supported |
- |
Lineage Building
Lineage entities |
Details |
---|---|
Table Lineage |
Supported |
Column Lineage |
Supported |
Lineage Sources |
Stored Procedures, functions, triggers, views, SQL queries (from Query Sheet) |
Querying
Operation |
Details |
---|---|
Select |
Supported |
Insert |
Not supported by default. |
Update |
Not supported by default. |
Delete |
Not supported by default. |
Joins within database |
Supported |
Joins outside database |
Not supported |
Aggregations |
Supported |
Group By |
Supported |
Order By |
Supported |
By default, the service account provided for the connector will be used for any query operations. If the service account has to write privileges, then Insert / Update / Delete queries can be executed.
Connection Details
To use the Amazon Redshift connector, the details specified in the following section should be available.
Pre-requisites
- By default, the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert/Update/Delete queries can be executed.
- The minimum privileges required are
Operation |
Minimum Access Permission For Service Account |
---|---|
Connection Validation |
Select and Usage |
Crawling |
Select, Usage , Reference, and Execution |
Profiling |
Select |
Query execution |
Select |
Add a connection
To connect to the Amazon Redshift connector using the OvalEdge application, complete the following steps.
- Log in to the OvalEdge application.
- Navigate to Administration > Connectors module.
- To add a new connection, click on the +AddNewConnector icon. A manage connection pop-up is displayed to select a connector. Search for, or select an Amazon Redshift connector. The Manage Connection form is displayed, where information for setting up a connection is provided.
Field Name |
Description |
---|---|
Connection Type |
Select Amazon Redshift connector. By default, the selected connection type is displayed as Amazon Redshift. If required, the connection type can be changed, and depending on the connector selected; fields are displayed accordingly. |
License Type |
By default, License type is Auto Lineage. License Type: In a license type, the permissions are specified based on the requirements of the customer. Users have the option to select the license type as Standard or Auto Lineage. The connector license is categorized into the: (i) Standard: The standard connectors may not have Auto Lineage functionality. It will not build the lineage for the selected database. (ii) Auto Lineage: Additionally, auto lineage connectors have Auto Lineage functionality. It will build the lineage for the selected database. See, License Types for more information. |
Connection Name |
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Amazon Redshift connection in the OvalEdge application. |
Environment |
By default, there is no selection of environment from the dropdown list when there is no environment is configured for Redshift. Environment: The purpose of the environment field is to help users to understand that the new connector is established in an environment available at the development, production, and QA. |
Server |
Enter the fully qualified server name or IP address related to the Redshift database instance URL (on-premises/cloud-based) Example: 34.54.23.43 |
Port |
By default, port number 5439, related to the Redshift database, is displayed. If needed, a new port number can be provided. |
Database |
Enter the name of the database. It is a user-specified database name to crawl the metadata of selected connectors. |
Driver |
By default, the driver details associated with the connector database are displayed, and it is non-editable. |
Username |
By default, the username of the OvalEdge application is displayed. If required, users can manually enter the user's credential of the OvalEdge application. |
Password |
By default, the OvalEdge application password is displayed. If required, users can manually enter the user's credential of the OvalEdge application. |
Connection String |
A connection string is used to configure the Amazon Redshift connection object. Amazon Redshift connections are specified by key-value pairs in a connection string. The connection string has auto and manual modes. By default, it is set to Auto mode, where connection string details are fetched automatically and displayed in the connection string textbox. Alternatively, the user can manually enter the string by setting the toggle button to manual mode. Example: jdbc:amazon redshift://{server}:5439/{sid} |
Plugin Server |
Provide the server name if you are running this as a plugin. |
Plugin Port |
Provide the port number on which the plugin is running. |
Governance Roles |
The admin will select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned to the data asset. The dropdown list displays all the configured users (single user or a team) as per the configurations made in the Security > Governance Roles page. |
No. of Archived Objects |
By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive. No. of archive objects: It is the count of the number of last modifications made in the metadata data of a Remote/source. For example, if you update the count as 4 in the ‘No. of archive object’ field, and then the connection is crawled. It will provide the last 4 changes that occurred in the remote/source of the connector. You can observe these changes in the ‘version’ column of the ‘Metadata Changes’ module. |
Select Bridge |
Bridge: To collect data from both on-premises and cloud sources, data administrators need to configure Cloud Bridge by specifying a data source connection and unique parameters. Note: If Redshift Connector is configured with a bridge, the Bridge ID will be shown in the Bridge dropdown menu when bridges are configured and added, or it will be displayed as "NO BRIDGE". |
Error Validation Details
Error Message(s) |
Description |
Connection Timeout |
The server is not Up and Running. |
BadSQLgrammar |
A SQL query may fail with a bad SQL grammar exception |
The table or view does not exist |
Do not have permission or any syntax error in the query. |
Connection Settings
Once the connection is established successfully, there are various settings provided to fetch and analyze the information from the data source. The connection setting includes Crawler, Profiler, Remote Access, Access Instruction, Lineage, and Others. Based on the connector selected the option on the setting page will differ.
To configure the connector settings, select the Redshift Connection Name from the Crawler Information page and click on the Nine dots option and select the Settings options. The Connection Settings pop-up window is displayed.
Crawler Settings
The crawler setting will help to get the information of the data source based on the crawler setting provided. Crawler setting has various settings options, including Crawler options, Crawler Rules, Query Log Setting.
Crawler Settings |
Details |
---|---|
Tables, Views, and Columns |
Select the checkbox to crawl the tables, views, and columns existing in the Oracle Service Cloud database into the OvalEdge. Note: By default, the checkbox for Tables, Views, and Columns is selected. |
Procedure Functions & Relationship |
Select the checkbox to determine relationships between the tables for the selected schemas existing in the Amazon Redshift Connector database into the OvalEdge. |
Crawler Rules: Include Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling. |
Crawler Rules: Exclude Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling. |
Profiler Settings
Profiler Settings |
Details |
Tables and Columns |
Select the checkbox to profile the tables and columns that exist in the Oracle Service Cloud database in the OvalEdge. Note: By default, the checkbox for Tables and Columns is selected. |
Views and Columns |
Select the checkbox to profile the views and columns that exist in the Oracle Service Cloud database in the OvalEdge. Note: By default, the checkbox for Views and Columns is selected. |
Profile Rules: Include Regex |
Enter the specific table, view, and column names that start with, end with or have middle characters that are included for profiling. |
Profile Rules: Exclude Regex |
Enter the specific table, view, and column names that start with, end with or have middle characters that are excluded for profiling. |
To configure the Profile Setting,
Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.
The attributes are as follows,
Columns |
Description |
Order |
The order number is the sequence in which the profiling is done. |
Day |
The day of the week on which profiling is set to run. |
Start/End Time |
The start and end time at which profiling is set to perform. |
Number of Threads |
Thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source. |
Profile Type |
There are four main types of data profiling.
The profiling is performed based on a given Sample Profile Size. The data on columns ( statistics such as Min, Max, Distinct, and Null Count) will be different compared to full profiles as it is calculated only on sample size.
Note: A profile type set to “Auto” will always depend on the Rowcount limit when the row count constraint must be set as “True.”
|
Row Count Constraint |
The Row Count Constraint option is applicable only when the Profile Type is selected as Auto.
If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit. |
Row Count Limit |
Enter the maximum number of rows that should be considered for profiling. |
Sample Profile Size |
Select this option for considering the total number of rows in profiling. |
Query Timeout |
Enter the number of seconds for the query to run on the remote database before it times out. |
Crawler/Profiler
Once the setting for profiling has been configured, go back to the Crawler screen and click “Crawl/Profile” to begin the profiling.
Remote Access
This Remote Access tab lists the data objects and the meta and data permissions on these objects that a user is assigned access to in a remote application.
Remote Data Access Management
Remote data access management has three ways to connect a remote database
- None: When you crawl any FileFolders/Buckets, all the users and roles from the remote source will come into the Remote Users tab and Remote Roles tab in the Administration > Users & Roles.
- Remote System is a master: In the Remote Access tab, the user selects an option of a Remote system is the master, and when you crawl a remote connection, all the users and roles available in the remote source about that FileFolders/Buckets connection are displayed in the OvalEdge (Administration - > Users & Roles).
- At the time of crawling, the user permission available on that FileFolders/Buckets will also be reflected in the Users & Roles | Remote users and Remote roles tab. You will be able to log in with that user's default password; then, you can change it on the first login.
- When this option is selected, the admin users cannot create, update or delete the users or roles will also be reflected in the Security, FileFolders/Buckets tab.
- OvalEdge is a master: When OvalEdge is the master, users can assign Roles and User-based permissions to Objects. For that admin, users can use the existing Users and Roles, or it can create new Users and Roles and then assign them.
- At the time of Crawling, users and roles assigned to the FileFolders/Buckets are displayed.
- When this option is selected, the admin users can create, update or delete the users or roles. This will get reflected or added in remote sources as well. It also considers the roles permissions and FileFolders/Buckets permissions. Security FileFolders/Buckets level permission can be updated from OvalEdge.
Note: Remote is master, or OE is master in the Remote Access will not work unless Users, Roles, Policies & Permissions are not checked.
Data Access Authorization (Data Permissions)
The Data Access Authorization is about the data permissions is to ensure that the right user is accessing the query sheet and queries in the data catalog. Here, the system validates the user credentials and allows that user to access the query sheet and queries in the data catalog.
- OvalEdge follows OvalEdge data permissions: When this option is selected, the OvalEdge application uses a service account used to create the connection in the
- Query Sheet page
- Data Catalog > Table > Data page and
- Data Catalog > Code page to fetch the required data
- OvalEdge follows Remote system permissions: When this option is selected, then the users are prompted to enter the remote data source credentials in the
- Query Sheet page,
- Data Catalog > Table > Data page and
- Data Catalog > Query page
The submitted credentials are used to establish the connection to fetch the required data. Here, these credentials will be cached and cleared after seven days of inactivity.
Note: Remote is master, or OE is master in the Remote Access will not work unless Users, Roles, Policies & Permissions are not checked.
Remote Policy
Sync OvalEdge policy with Remote: You can select the check box to assign the OvalEdge policy with the remote. When selected, this option enables various predefined OvalEdge policy schemes to be applied on the remote connection.
Query Policy
The Query Policies in the Crawler setting are used to provide the right access to the Query sheet functions (Join, Union, SUM, or aggregate functions). You can specify the desired roles and permission to deny the usage of the query sheet function. A role that has been denied policy permission will not have access to those functions in the Query Sheet.
- Click on the Query Policies tab, and the data grid with existing Policies details with Role, Query Type, Access Type, and Delete columns are displayed.
- To include new query policies, click on the +Add Query Policy button.
- Select the desired Role, Query Type, and Access Type from the respective dropdowns and click on the Save Changes button
Example: If the user selects the Role as “OE_HRADMIN,” Query Type as “JOIN” and the Access Type as “DENY,” then the users associated with the OE_HRADMIN privileges are restricted from using the JOIN function in the Query Sheet page.
Access Instruction
It allows the admin user to write the instructions and guide the business user to crawl the data source. Instructions are nothing, but some related information about the data source connection can be provided here. it could have the information resources like links, Images, or Videos that help business users of that particular data source connection.
For example: When an admin user saves the access instructions and crawls the schema, the saved instruction will appear in the data catalog after clicking on the schema. It's just providing info or guidelines on a specific schema.
- You can provide the instruction in Crawler > Setting page
- Click on the Access Instruction tab
- Enter the instructions
Click the Save Changes button. Once you add the access instruction for a specific connection in the crawler settings, it will appear in the connection hierarchy like a database.
Other
The Send Metadata Changes Notifications option is provided in the Others tab of the settings page. The purpose is to send notifications on any metadata changes to data owners or stewards.
Send Metadata Notification to:
- Select whether the notifications for the Data Owner and Data Steward under the specific Roles need to be sent or not.
- Select the desired role from the Roles dropdown menu and click on the Save Changes button. The notifications will be successfully sent to select Data Owner and Data Steward.