File

Amazon S3 Connector

Simple Storage Service (S3) is a data storage service provided by AWS that enables users to store data and access any amount of data, at any time, from anywhere on the web.

OvalEdge uses AWS S3 SDK to connect to the data source, which allows the user to crawl and profile data objects (Tables, Table Columns, etc.)

undefined-May-26-2023-10-05-58-5353-AMConnector Capabilities

The following is the list of objects and data types the Amazon S3 connector supports.

Functionality

Support Data Objects

Crawler 

  • Buckets
  • File Objects

Profiler

  • File Profiling: Row count, Columns count, and View sample data
  • Sample Profiling 

Note: Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC, GZ

Prerequisites

The following are the prerequisites required for establishing a connection between the connector and the OvalEdge application. 

  1. API details
  2. Service Account with Minimum Permissions.
  3. Configure environment variables (Optional).

API details

API

Version

Details

AWS S3 SDK

1.12.232

https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.12.232

Note: Latest version is 1.12.244.

Service Account with Minimum Permissions 

The following are the minimum privileges required for a service account user to crawl data objects.

Operation 

Minimum Access Permission 

Connection Validation

LIST, GET permission on Crawling Buckets

Crawling 

LIST, GET permission on Crawling File Objects

Establish Environment Variables (Optional)

This section describes the settings or instructions that you should be aware of prior to establishing a connection. If your environments have been configured, skip this step.

Configure Environment Names

The Environment Names allow you to select the environment configured for the specific connector from the dropdown list in the Manage Connector pop-up window. This is done to identify which environment your connector is connecting to at a glance.
You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments. 

Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance. 

Steps to Configure the Environment in OvalEdge:

  1. Navigate to Administration | Configuration. 
  2. Select the Connector tab.
  3. Find the Key name “connector.environment”.
  4. Enter the desired environment values (PROD, STG) in the value column. 
  5. Click ✔ to save. 

Establish Connection - IAM 

AWS Identity and Access Management(IAM) authentication is used to grant access permission to the bucket and the objects in it. You can create and configure IAM user policies for controlling user access to Amazon S3. IAM user belongs to one particular user. 

To connect to the S3 database using  IAM User Authentication, complete the following steps.

  1. Log into the OvalEdge application
  2. In the left menu, click on the Administration module name, and click on the Connectors sub-module name. The Add Connectors Information page is displayed.
  3. Click on + Add Connector. Select the connection type as Amazon S3. The Add Connector pop-up with  Amazon S3-specific details is displayed.

s3-imuserauth

Field Name

Description

Connection Type

The selected connection type is displayed as ‘S3’ by default. 

If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

Authentication 

IAM User Authentication

License Add-Ons*

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

  • Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. 
  • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
  • Select the Data Access Add-On license that will enforce connector access via OvalEdge with Remote Data Access Management (RDAM) feature enabled

Environment

The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).
The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA). 
Note: The steps to set up environment variables in explained in the prerequisite section.

Connection Name

Enter a Connection name for Amazon S3. Users can specify a connection name to identify the Amazon S3  connection in OvalEdge. Example: AmazonS3_db

Access key

It is an access key of an IAM user.

Secret key

It is a secret key of an IAM user.

Filter by tags

Tags of a Bucket/ Object

Region

Region of S3

SSO Connection Id

Connection Id of the identity provider’s connection [Azure, Okta, AVM … etc]

SSO Application Id

Application Id crawled from the identity provider’s connection 

[Azure, Okta, AVM … etc]

SSO Role Prefix

Role name from the crawled roles of the identity provider’s connection [Azure, Okta, AVM … etc] 

RDAM Policy Folder Path

Bucket/Folder path in the S3 to write the policies.

Default Governance Roles

You can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. 

Note: The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section.  

Admin Roles

Select the required admin roles for this connector.

  • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
    The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
  • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
    The security and Governance Admin is responsible for:
    • Configure role permissions for the connector and its associated data objects
    • Add admins to set permissions for roles on the connector and its associated data objects
    • Update governance roles
    • Create custom fields
    • Develop Service Request templates for the connector.
    • Create Approval workflows for the templates

No. of Archive Objects

By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive.

No. of archive objects:  It is the count of the number of last modifications made in the metadata data of a Remote/source. 

For example, if you update the count as 4 in the ‘No. of archive object’ field, and then the connection is crawled. It will provide the last 4 changes that occurred in the remote/source of the connector. You can observe these changes in the ‘version’ column of the ‘Metadata Changes’ module. 

Select Bridge 

With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes managing data movement between any source and destination easy.

For more information, refer to Bridge Overview.

When the bridge is configured and added, the Bridge ID will be displayed in the dropdown menu, or it will be displayed as "NO BRIDGE."

For more information, refer to Bridge Overview

4. Click on the Validate button to validate the connection details.  

5. Click on the Save button to save the connection.  Alternatively, the user can also directly click on the Save & Configure button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

Note: * (asterisk) indicates the mandatory field required to create a connection. Once the connection is validated and saved, it will be displayed on the Connectors home page. 

Note: You can either save the connection details first, or you can validate the connection first and then save it. 

Error Validation Details 

S.No.

Error Message(s)

Description

1

Failed to establish a connection, Please check the credentials

Invalid credentials are provided or the user or role does not have access.

2

Configured RDAM Policy Bucket: X doesn't exist

a valid bucket or bucket doesn’t exist.

3

Errors while downloading the File.

403: Access denied [Provide appropriate access to user or role using in connection]

404: No such key [The object does not exist in the remote.] 

Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Establish Connection - Role-Based

To connect to the S3 database using Role-Based Authentication, complete the following steps.

  1. Log into the OvalEdge application
  2. In the left menu, click on the Administration module name, and click on the Connectors sub-module name. The Add Connectors Information page is displayed.
  3. Click on + Add Connector. Select the connection type as Amazon S3. The Add Connector pop-up with  Amazon S3-specific details is displayed.

s3-rolebased

Field Name

Description

Connection Type

The selected connection type is displayed as ‘S3’ by default. 

If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

Authentication 

Role-Based Authentication

License Add-Ons*

All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. 

OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

  • Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature. 
  • Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality, using DQ Rules/functions, Anomaly detection, Reports, and more.
  • Select the Data Access Add-On license that will enforce connector access via OvalEdge with Remote Data Access Management (RDAM) feature enabled

Environment

The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).
The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA). 
Note: The steps to set up environment variables in explained in the prerequisite section.

Connection Name

Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Amazon S3 database connection in the OvalEdge application.


Example: Amazon S3 Connection

Cross Account Role ARN

ARN of the AWS role.

Filter by tags

Tags of a Bucket/ Object

Region

Region of S3

SSO Connection Id

Connection Id of the identity provider’s connection [Azure, Okta, AVM … etc]

SSO Application Id

Application Id crawled from identity provider’s connection 

[Azure, Okta, AVM … etc]

SSO Role Prefix

Role name from the crawled roles of the identity provider’s connection [Azure, Okta, AVM … etc] 

RDAM Policy Folder Path

Bucket/Folder path in the S3 to write the policies.

Default Governance Roles

Select the required governance roles for the Steward, Custodian, and Owner

No. of Archive objects

By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive.

Select Bridge 

With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes managing data movement between any source and destination easy.

For more information, refer to Bridge Overview.

When the bridge is configured and added, the Bridge ID will be displayed in the dropdown menu, or it will be displayed as "NO BRIDGE."

For more information, refer to Bridge Overview

5. Click on the Validate button to validate the connection details. 

6. Click on the Save button to save the connection.  Alternatively, the user can also directly click on the Save & Configure button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

Error Validation Details 

S.No.

Error Message(s)

Description

1

Failed to establish a connection, Please check the credentials

Invalid credentials are provided or the user or role does not have access.

2

Configured RDAM Policy Bucket: X doesn't exist

Not a valid bucket or a bucket doesn’t exist.

3

Errors while downloading the File.

403: Access denied [Provide appropriate access to user or role using in connection]

404: No such key [The object does not exist in the remote.] 

Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Connector Settings

Once the connection is validated successfully, various settings are provided to retrieve and display the information from the data source. 

Connection Settings

Description

Crawler

Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata. Check out the crawler options to set the crawler's behavior in the  Crawler & Profiler Settings.

Data Access

It is possible to access data objects from remote systems through Data Access or RDAM (Remote Data Access Management). It refers to the data objects and the meta and data permissions on these objects that a user has access to in the remote data source. 

For information, refer to Remote Data Access Management 

Access Instruction

Access Instruction allows the data owner to instruct other users on using the objects in the application. It ensures that users can effectively use the data.

The Crawling of File(s)

To crawl a File Connector, 

  1. Click the Crawl/Profile button to initiate the crawling process. 
  2. A message appears confirming the successful submission to the catalog buckets job. 
  3. Navigate to the Jobs module to monitor the job. Find the job name called CATALOG_FILESERVER_BUCKET which is associated with the File Connector crawling job. Once the crawling job is successfully completed. 
  4. Navigate to the Data Catalog | Files tab.
  5. Locate and select the File connector and view the relevant files. 

Additional Information

S3 User Authentication Types

In the OvalEdge application, the S3 connector allows you to crawl the buckets and file data objects using IAM User Authentication and Role-Based Authentication

IAM User Authentication: 

AWS Identity and Access Management(IAM) authentication is used to crawl objects, and access permissions on the bucket and the objects in it. You can create and configure IAM user policies for controlling user access to Amazon S3. IAM user belongs to one particular user. It requires a Secret key and an Access key for the successful building of a connection. 

Role-Based Authentication: 

Amazon Resource Name(ARN) is a unique identification name to identify the AWS resource such as buckets, folders, users, and roles. In AWS roles are identified using ARN, and no Secret Key and Access Key are required. Resource ARNs can include a path. For example, in Amazon S3, the resource identifier is an object name that can include slashes (/) to form a path. This will help to access multiple applications within S3. 

Remote Data Access Management (RDAM)

Remote Access

This Remote Access tab lists the data objects and the meta and data permissions on these objects that a user is assigned access to in a remote application.

Remote Data Access Management

Remote data access management has three ways for connecting a remote database

None: When you crawl any FileFolders/Buckets, all the users and roles from the remote source will come into the Remote Users tab and Remote Roles tab in the Administration > Users & Roles.

Remote System is a master: In the Remote Access tab, the user selects an option of a Remote system is the master, and when you crawl a remote connection, all the users and roles available in the remote source pertaining to that FileFolders/Buckets connection are displayed in the OvalEdge (Administration - > Users & Roles). 

      • At the time of crawling the user permission available on that FileFolders/Buckets will also be reflected in the Users & Roles | Remote users and Remote roles tab. You will be able to log in with that user's default password, then you can change it on the first login.
      • When this option is selected the admin users cannot create, update or delete the users or roles will also be reflected in the Security, FileFolders/Buckets tab.

OvalEdge is a master: When OvalEdge is the master, users can assign Roles and User-based permissions to Objects. For that admin, users can use the existing Users and Roles or it can create new Users and Roles and then assign them.

      • At the time of Crawling, users, and roles assigned to the FileFolders/Buckets are displayed.
      • When this option is selected the admin users can create, update or delete the users or roles. This will get reflected or added in remote sources as well. It also considers the roles permissions and FileFolders/Buckets permissions. Security FileFolders/Buckets level permission can be updated from OvalEdge

Note: Remote is master or OE is master in the Remote Access will not work unless Users, Roles, Policies & Permissions are not checked

Remote Policy

Sync OvalEdge policy with Remote:  You can select the check box to assign the OvalEdge policy with the remote. When selected, this option enables various predefined OvalEdge policy schemes to be applied on the remote connection. 


Copyright © 2023, OvalEdge LLC, Peachtree Corners GA USA