S3 Connector

An out-of-the-box connector is available for S3 entities. It provides support for crawling and profiling for S3 buckets and objects and catalogs in OvalEdge.

Connectivity Summary

S3_1-2

The connectivity to AWS S3 is via AWS S3 SDK, which is included in the platform. 

The S3 SDK used by the connector are given below:

Driver / API

Version

Details

AWS S3 SDK

1.12.232

https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3/1.12.232

Note : Latest version is 1.12.244.

Technical Specifications

The connector capabilities are shown below:

The AWS S3 entities are Buckets and File objects. We will crawl and profile the Buckets and objects in OvalEdge.

Crawling

Feature

Supported Objects

Remarks

Crawling

Buckets, File Objects

 

Profiling

Feature

Supported Objects

Remarks

File Profiling

Row count, Columns count, View sample data

Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC

Sample Profiling

Supported

 

By default, the service account provided for the connector will be used for any user operations. If the service account has privileges like LIST, GET operations can be executed.

Pre-requisites:

To use the connector, the following need to be available:

  • Connection details as specified in the following section should be available.
  • An admin / service account, for crawling. The minimum privileges required are:

Operation 

Access Permission

Crawl Buckets

LIST, GET permission on Bucket

Crawl File Objects

LIST, GET permission on Objects

Connection Details

The following connection settings should be added for connecting to a S3:

Property

Details

Database Type

S3

Connection Name

Select a Connection name for the S3. The name that you specify is a reference name to easily identify your AWS S3 connection in OvalEdge. 

Example: AWS S3 Connection.

Authentication

Select the authentication type whether it is  Role based authentication or Basic Authentication.

Access key

Access key

Secret key

Secret key

Filter by tags

Tags of a bucket/ Object

Region

Region of S3

General Authentication Connection fields : 

S3_2

Role Based Authentication Connection Fields :

S3_3

Note: S3 tags are used to filter out the buckets or objects based on the tags and crawl into OvalEdge.