DSEFS Connector

An out of the box connector is available for the DSEFS database. It provides support for crawling database objects, profiling of sample data.

Connectivity Summary

dsfeconnect

Driver / API

Version

Details

DSEFS Rest API

2.10.1

https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/analytics/dsefsRestInterface.html

Technical Specifications

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks

Crawling

Files/Folders

While crawling buckets/FileFolders will be cataloged by default and also crawl buckets along with tags

Profiling

Feature

Supported Objects

Remarks

File Profiling

Row count, Columns count, View sample data

 

Sample Profiling

Supported

 

By default, the service account provided for the connector will be used for any user operations. If the service account has write privileges, then Insert / Update / Delete operations can be executed.

Pre-requisites

To use the connector, the following need to be available:

  • Connection details as specified in the following section should be available.
  • An admin / service account, for crawling and profiling. The minimum privileges required are:
Operation 

Access Permission

Connection validate

R

Crawl Buckets 

R

Catalog files/folders

R

Profile files/folders 

R

Connection Details

The following connection settings should be added for connecting to a DSEFS Server:

dsfe1

Property

Details

Database Type

DSEFS

Connection Name

Select a Connection name for the DSEFS database. The name that you specify is a reference name to easily identify your SQL Server database connection in OvalEdge. Example:DSEFS Server Connection

DSEFS URL

IP of the server with the port on which HDFS is running.

Eg. hdfs://3.140.32.52:8020

Validation occurs when the given URL is correctly given.

Once connectivity is established, additional configurations for crawling and profiling can be specified:

dfsecrawler

Property Details

Crawler configurations

Crawler Options

FileFolders/Buckets by default enabled

Crawler Rules

Include and exclude regex for File Folders and Buckets only but not for files

Profiler Settings

Profile Options

No Existence for Profile

Profile Rules

No Profile Rules Exist

Points to note:

  • Supported File Types: CSV, XLS, XLSX, JSON, AVRO, PARQUET, ORC