ETLs

GitHub Connector

An out-of-the-box connector is available for the GitHub Repository. OvalEdge provides support for crawling Files/Datasets i.e Github Folders or Files.

22-3

OvalEdge Crawling: It is a process of collecting information about data from various data sources like on-premise, cloud databases, Hadoop, visualization software, and file systems.

When an OvalEdge crawler connects to a data source, it collects and catalogs all the data elements (i.e., metadata) and stores it in the OvalEdge data repository. Here, the crawler creates an index for every stored data element, which can later be used in data exploration within the OvalEdge Data catalog, which is a smart search. 

The OvalEdge crawlers can be scheduled to scan the databases regularly, so they always have an up-to-date index of the data element.

Data Sources: The Data Sources are the ones where the OvalEdge crawler integrates with various data sources to help the users to extract metadata and build a data catalog. 

This document provides information about how to make a connection to your GitHub Repository and crawl the data from various Repositories.

Connect to the Data: Before crawling and building a connection, you must first connect to your data. OvalEdge requires users to configure a separate connection for each type of data source. The users must enter the source credentials and database information for each type of connectivity. Once a data connection is made, a simple click of the Crawl button starts the crawling process.

Connector Capabilities

The connectivity to the GitHub Connector is performed via the Rest API. Here, the user must be collaborated with the organization and can have read access to the repository, and must generate a personal access token under the profile settings option in the GitHub page.

Technical Specifications

Crawling

Feature

Supported Objects

Remarks

Crawling

Jobs

It fetches all folders and files from the GitHub instance.

Connection Details

Pre-requisites

To use the GitHub Connector, the details specified in the following section should be available.

  • An admin/service account for Crawling. 
  • The minimum privileges required for users are

Operation 

Access Permission

Connection Validation

User should be associated with the organization for the given repository or the user should be the owner of the repository.

  • Configuration Setting: Configuration key (ovaledge.extauth.authtype) need to be set as HYBRID for OAuth authentication setup

To connect to the GitHub instance using the OvalEdge application, complete the following steps.

  1. Login to the OvalEdge application
  2. In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
  3. Click on the Crawler sub-module name, and the Crawler Information page is displayed.
  4. In the Crawler Information page, click on the + icon. The Manage Connection with Search Connector pop-up window is displayed.
  5. In the Manage Connection pop-up window, select the connection type as GitHub. The Manage Connection with GitHub-specific details pop-up window is displayed.

Github

6. The following are the field attributes required for the connection of GitHub.

Property

Details

Connection Type

GitHub

License Type

Standard

Name

Select a Connection name for GitHub. The name that you specify is a reference name to easily identify the GitHub instance connection in OvalEdge.

Example: GitHub1

GitHub Organization

Enter the name of the organization

Ex: ovaledgeindia

GitHub Owner

Enter the owner name of the repository

Ex: John David 

Repo Name

Enter the name of the repository

Ex: ovaledgesuperset

Personal Access Token

Enter the personal access token of the user

Note: This token is generated by the GitHub owner in the GitHub Instance.

Ex:ghp_z0HvXmn1vnz6wes1naOhEKZ8CYygJo0ixtew

GitHub Path

Enter the path of the particular repository folder

Ex: Test/

Default Governance Roles

Select the required governance roles for the Steward, Custodian, and Owner

7. Once after entering the connection details in the required fields, click on the validate button the entered connection details are validated the Save and Save & Configure buttons are enabled.
8. Click on the save button to establish the connection or the user can also directly click on the save and configure button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

Crawling/Profiling 

Once connectivity is established successfully, in the Crawler Information page select the GitHub connection and click on the Crawl/Profile button. The Crawling and Profiling pop-up window is displayed. 

Select the specific schema which needs to be crawled and select the Crawl option and click on the Run button. The respective job associated with the GitHub connection is triggered and the data existing in the specified GitHub Repository is fetched and displayed in the Data Catalog Queries page.

Security Information

Ovaledge does not lift any secured data from the source system (Version Control).

Any security information under the config (JSON files)  is filtered.

Below is the sample screenshot that depicts the raw data from the source which has environment variables.

1-Nov-11-2022-09-37-58-0108-AM

Below is the sample screenshot that depicts after filtering environment variables and saving them in Ovaledge.

21