An out-of-the-box connector is available for S3 entities. It provides support for crawling and profiling for S3 buckets and objects and catalogs in OvalEdge.
Connectivity Summary
The connectivity to AWS S3 is via AWS S3 SDK, which is included in the platform.
The S3 SDK used by the connector are given below:
Driver / API |
Version |
Details |
AWS S3 SDK |
1.12.232 |
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.12.232 Note : Latest version is 1.12.244. |
Technical Specifications
The connector capabilities are shown below:
The AWS S3 entities are Buckets and File objects. We will crawl and profile the Buckets and objects in OvalEdge.
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Buckets, File Objects |
Profiling
Feature |
Supported Objects |
Remarks |
---|---|---|
File Profiling |
Row count, Columns count, View sample data |
Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC |
Sample Profiling |
Supported |
By default, the service account provided for the connector will be used for any user operations. If the service account has privileges like LIST, GET operations can be executed.
Pre-requisites:
To use the connector, the following need to be available:
- Connection details as specified in the following section should be available.
- An admin / service account, for crawling. The minimum privileges required are:
Operation |
Access Permission |
Crawl Buckets |
LIST, GET permission on Bucket |
Crawl File Objects |
LIST, GET permission on Objects |
Connection Details
The following connection settings should be added for connecting to a S3:
Property |
Details |
Database Type |
S3 |
Connection Name |
Select a Connection name for the S3. The name that you specify is a reference name to easily identify your AWS S3 connection in OvalEdge. Example: AWS S3 Connection. |
Authentication |
Select the authentication type whether it is Role based authentication or Basic Authentication. |
Access key |
Access key |
Secret key |
Secret key |
Filter by tags |
Tags of a bucket/ Object |
Region |
Region of S3 |
General Authentication Connection fields :
Role Based Authentication Connection Fields :
Note: S3 tags are used to filter out the buckets or objects based on the tags and crawl into OvalEdge.