Simple Storage Service (S3) is a data storage service provided by AWS that enables users to store data and access any amount of data, at any time, from anywhere on the web.
OvalEdge uses AWS S3 SDK to connect to the data source, which allows the user to crawl and profile data objects (Tables, Table Columns, etc.)
Connector Capabilities
The following is the list of objects and data types the Amazon S3 connector supports.
Functionality |
Support Data Objects |
Crawler |
|
Profiler |
|
Note: Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC, GZ
Prerequisites
The following are the prerequisites required for establishing a connection between the connector and the OvalEdge application.
- API details
- Service Account with Minimum Permissions.
- Configure environment variables (Optional).
API details
API |
Version |
Details |
AWS S3 SDK |
1.12.232 |
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.12.232 Note: Latest version is 1.12.244. |
Service Account with Minimum Permissions
The following are the minimum privileges required for a service account user to crawl data objects.
Operation |
Minimum Access Permission |
---|---|
Connection Validation |
LIST, GET permission on Crawling Buckets |
Crawling |
LIST, GET permission on Crawling File Objects |
Establish Environment Variables (Optional)
This section describes the settings or instructions that you should be aware of prior to establishing a connection. If your environments have been configured, skip this step.
Configure Environment Names
The Environment Names allow you to select the environment configured for the specific connector from the dropdown list in the Manage Connector pop-up window. This is done to identify which environment your connector is connecting to at a glance.
You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments.
Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance.
Steps to Configure the Environment in OvalEdge:
- Navigate to Administration > System Settings.
- Select the Connector tab.
- Find the Key name “connector.environment”.
- Enter the desired environment values (PROD, STG) in the value column.
- Click ✔ to save.
Establish Connection - IAM
AWS Identity and Access Management(IAM) authentication is used to grant access permission to the bucket and the objects in it. You can create and configure IAM user policies for controlling user access to Amazon S3. IAM user belongs to one particular user.
To connect to the S3 database using IAM User Authentication, complete the following steps.
- Log into the OvalEdge application
- In the left menu, click on the Administration module name, and click on the Connectors sub-module name. The Add Connectors Information page is displayed.
- Click on + Add Connector. Select the connection type as Amazon S3. The Add Connector pop-up with Amazon S3-specific details is displayed.
Field Name |
Description |
---|---|
Connector Type |
The selected connection type is displayed as ‘S3’ by default. If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed. |
Authentication* |
IAM User Authentication |
Credential Manager* |
Select the option from the drop-down list to indicate where you want to save your credentials: OE Credential Manager: Qlik Sense connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Qlik Sense database. Users need to add the credentials manually if the OE Credential Manager option is selected. HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge. AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server; OvalEdge fetches the credentials from the AWS Secrets Manager. Azure Key Vault: For more information, click Azure Key Vault. For more information on Credential Manager, refer to Credential Manager. |
License Add-Ons |
All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
|
Credential Manager ConnId |
Enter Credential Manager ConnId |
Connector Environment |
The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment). |
Connection Name* |
Enter a Connection name for Amazon S3. Users can specify a connection name to identify the Amazon S3 connection in OvalEdge. Example: AmazonS3_db |
Access key* |
It is an access key of an IAM user. |
Secret key* |
It is a secret key of an IAM user. |
Filter by tags |
Tags of a Bucket/ Object |
Region |
Region of S3 |
SSO Connection Id |
Connection Id of the identity provider’s connection [Azure, Okta, AVM … etc] |
SSO Application Id |
Application Id crawled from the identity provider’s connection [Azure, Okta, AVM … etc] |
SSO Role Prefix |
Role name from the crawled roles of the identity provider’s connection [Azure, Okta, AVM … etc] |
RDAM Policy Folder Path |
Bucket/Folder path in the S3 to write the policies. |
Default Governance Roles* |
You can select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. Note: The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section. |
Admin Roles* |
Select the required admin roles for this connector.
|
No. of Archive Objects* |
By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive. No. of archive objects: It is the count of the number of last modifications made in the metadata data of a Remote/source. For example, if you update the count as 4 in the ‘No. of archive object’ field, and then the connection is crawled. It will provide the last 4 changes that occurred in the remote/source of the connector. You can observe these changes in the ‘version’ column of the ‘Metadata Changes’ module. |
Select Bridge* |
With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes managing data movement between any source and destination easy. For more information, refer to Bridge Overview. When the bridge is configured and added, the Bridge ID will be displayed in the dropdown menu, or it will be displayed as "NO BRIDGE." For more information, refer to Bridge Overview |
4. Click on the Validate button to validate the connection details.
5. Click on the Save button to save the connection. Alternatively, the user can also directly click on the Save & Configure button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Note: * (asterisk) indicates the mandatory field required to create a connection. Once the connection is validated and saved, it will be displayed on the Connectors home page.
Note: You can either save the connection details first, or you can validate the connection first and then save it.
Error Validation Details
S.No. |
Error Message(s) |
Description |
1 |
Failed to establish a connection, Please check the credentials |
Invalid credentials are provided or the user or role does not have access. |
2 |
Configured RDAM Policy Bucket: X doesn't exist |
a valid bucket or bucket doesn’t exist. |
3 |
Errors while downloading the File. |
403: Access denied [Provide appropriate access to user or role using in connection] 404: No such key [The object does not exist in the remote.] |
Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.
Establish Connection - Role-Based
To connect to the S3 database using Role-Based Authentication, complete the following steps.
- Log into the OvalEdge application
- In the left menu, click on the Administration module name, and click on the Connectors sub-module name. The Add Connectors Information page is displayed.
- Click on + Add Connector. Select the connection type as Amazon S3. The Add Connector pop-up with Amazon S3-specific details is displayed.
Field Name |
Description |
---|---|
Connector Type |
The selected connection type is displayed as ‘S3’ by default. If required, the dropdown menu allows you to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed. |
Authentication* |
Role-Based Authentication |
Credential Manager* |
Select the option from the drop-down list to indicate where you want to save your credentials: OE Credential Manager: Qlik Sense connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Qlik Sense database. Users need to add the credentials manually if the OE Credential Manager option is selected. HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge. AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server; OvalEdge fetches the credentials from the AWS Secrets Manager. Azure Key Vault: For more information, click Azure Key Vault. For more information on Credential Manager, refer to Credential Manager. |
License Add-Ons |
All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a datasource. OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.
|
Credential Manager ConnId |
Enter Credential Manager ConnId |
Connector Environment |
The environment dropdown menu allows you to select the environment configured for the connector from the dropdown list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment). |
Connection Name* |
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Amazon S3 database connection in the OvalEdge application. Example: Amazon S3 Connection |
Cross Account Role ARN |
ARN of the AWS role. |
Filter by tags |
Tags of a Bucket/ Object |
Region |
Region of S3 |
SSO Connection Id |
Connection Id of the identity provider’s connection [Azure, Okta, AVM … etc] |
SSO Application Id |
Application Id crawled from identity provider’s connection [Azure, Okta, AVM … etc] |
SSO Role Prefix |
Role name from the crawled roles of the identity provider’s connection [Azure, Okta, AVM … etc] |
RDAM Policy Folder Path |
Bucket/Folder path in the S3 to write the policies. |
Default Governance Roles* |
Select the required governance roles for the Steward, Custodian, and Owner. |
Admin Roles* |
Select the required admin roles for this connector.
|
No. of Archive objects* |
By default, the number of archive objects is set to disable mode. Click on the Archive toggle button and enter the number of objects you wish to archive. |
Select Bridge* |
With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. A bridge provides real-time control that makes managing data movement between any source and destination easy. For more information, refer to Bridge Overview. When the bridge is configured and added, the Bridge ID will be displayed in the dropdown menu, or it will be displayed as "NO BRIDGE." For more information, refer to Bridge Overview |
5. Click on the Validate button to validate the connection details.
6. Click on the Save button to save the connection. Alternatively, the user can also directly click on the Save & Configure button that displays the Connection Settings pop-up window to configure the settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Error Validation Details
S.No. |
Error Message(s) |
Description |
1 |
Failed to establish a connection, Please check the credentials |
Invalid credentials are provided or the user or role does not have access. |
2 |
Configured RDAM Policy Bucket: X doesn't exist |
Not a valid bucket or a bucket doesn’t exist. |
3 |
Errors while downloading the File. |
403: Access denied [Provide appropriate access to user or role using in connection] 404: No such key [The object does not exist in the remote.] |
Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.
Connector Settings
Once the connection is validated successfully, various settings are provided to retrieve and display the information from the data source.
Connection Settings |
Description |
---|---|
Crawler |
Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata. Check out the crawler options to set the crawler's behavior in the Crawler & Profiler Settings. |
Data Access |
It is possible to access data objects from remote systems through Data Access or RDAM (Remote Data Access Management). It refers to the data objects and the meta and data permissions on these objects that a user has access to in the remote data source. For information, refer to Remote Data Access Management |
Access Instruction |
Access Instruction allows the data owner to instruct other users on using the objects in the application. It ensures that users can effectively use the data. |
The Crawling of File(s)
To crawl a File Connector,
- Click the Crawl/Profile button to initiate the crawling process.
- A message appears confirming the successful submission to the catalog buckets job.
- Navigate to the Jobs module to monitor the job. Find the job name called CATALOG_FILESERVER_BUCKET which is associated with the File Connector crawling job. Once the crawling job is successfully completed.
- Navigate to the Data Catalog | Files tab.
- Locate and select the File connector and view the relevant files.
Additional Information
S3 User Authentication Types
In the OvalEdge application, the S3 connector allows you to crawl the buckets and file data objects using IAM User Authentication and Role-Based Authentication
IAM User Authentication:
AWS Identity and Access Management(IAM) authentication is used to crawl objects, and access permissions on the bucket and the objects in it. You can create and configure IAM user policies for controlling user access to Amazon S3. IAM user belongs to one particular user. It requires a Secret key and an Access key for the successful building of a connection.
Role-Based Authentication:
Amazon Resource Name(ARN) is a unique identification name to identify the AWS resource such as buckets, folders, users, and roles. In AWS roles are identified using ARN, and no Secret Key and Access Key are required. Resource ARNs can include a path. For example, in Amazon S3, the resource identifier is an object name that can include slashes (/) to form a path. This will help to access multiple applications within S3.
Remote Data Access Management (RDAM)
Remote Access
This Remote Access tab lists the data objects and the meta and data permissions on these objects that a user is assigned access to in a remote application.
Remote Data Access Management
Remote data access management has three ways for connecting a remote database
None: When you crawl any FileFolders/Buckets, all the users and roles from the remote source will come into the Remote Users tab and Remote Roles tab in the Administration > Users & Roles.
Remote System is a master: In the Remote Access tab, the user selects an option of a Remote system is the master, and when you crawl a remote connection, all the users and roles available in the remote source pertaining to that FileFolders/Buckets connection are displayed in the OvalEdge (Administration - > Users & Roles).
-
-
- At the time of crawling the user permission available on that FileFolders/Buckets will also be reflected in the Users & Roles | Remote users and Remote roles tab. You will be able to log in with that user's default password, then you can change it on the first login.
- When this option is selected the admin users cannot create, update or delete the users or roles will also be reflected in the Security, FileFolders/Buckets tab.
-
OvalEdge is a master: When OvalEdge is the master, users can assign Roles and User-based permissions to Objects. For that admin, users can use the existing Users and Roles or it can create new Users and Roles and then assign them.
-
-
- At the time of Crawling, users, and roles assigned to the FileFolders/Buckets are displayed.
- When this option is selected the admin users can create, update or delete the users or roles. This will get reflected or added in remote sources as well. It also considers the roles permissions and FileFolders/Buckets permissions. Security FileFolders/Buckets level permission can be updated from OvalEdge
-
Note: Remote is master or OE is master in the Remote Access will not work unless Users, Roles, Policies & Permissions are not checked
Remote Policy
Sync OvalEdge policy with Remote: You can select the check box to assign the OvalEdge policy with the remote. When selected, this option enables various predefined OvalEdge policy schemes to be applied on the remote connection.
Copyright © 2023, OvalEdge LLC, Peachtree Corners GA USA