File

GitHub Files Connector

GitHub files refer to the various types of files stored within a GitHub repository, containing the components of a software project and its revision history.

OvalEdge uses REST API to connect to the GitHub Files source, which allows the user to crawl files and folders.

Connector Capabilities

The following is the list of objects supported by the GitHub Files connector.

Functionality

Supported Objects

Crawling 

  • Files
  • Folders

Prerequisites

The following are the prerequisites required for establishing a connection between the connector and the OvalEdge application.

  • API Details
  • Service account with minimum permissions
  • Configure environment variables (Optional)

API Details

API Details

Version

Details

API

2022-11-28

https://api.github.com/repos/

Service Account with Minimum Permissions

A service account with read-only privileges on GitHub Files is required for crawling.

Operation 

Minimum Access Permission 

Connection validation

Read

Crawl Files and Folders

Read

Establish Environment Variables (Optional)

This section describes the settings or instructions that you should be aware of before establishing a connection. If your environments have been configured, skip this step.

Configure Environment Names

The Environment Names allow you to select the environment configured for the specific connector from the drop-down list in the Add Connector pop-up window.

You might want to consider crawling the same schema in both stage and production environments for consistency. The typical environments for crawling are PROD, STG, or Temporary, and may also include QA or other environments. Additionally, crawling a temporary environment can be useful for schema comparisons, which can later be deleted, especially during application upgrade assistance. 

Steps to Configure the Environment

  1. Navigate to Administration > System Settings.
  2. Select the Connector tab.
  3. Find the Key name “connector.environment”.
  4. Enter the desired environment values (PROD, STG) in the value column. 
  5. Click ✔ to save.

Establish a Connection

To establish a connection, complete the following steps:

  1. Log in to the OvalEdge application.
  2. Navigate to Administration >  Connectors.
  3. Click on the + (New Connector) icon.
  4. The Add Connector pop-up window is displayed where you can search for the GitHub Files connector.
  5. The Add Connector with Connector Type specific details pop-up window is displayed. Enter the relevant information to configure the GitHub Files connection.
    Note: An asterisk (*) denotes a mandatory field required for establishing a connection.


    Field Name

    Description

    Connector Type

    By default, the selected connection type is displayed as the GitHubFiles. 

    If required, the drop-down list allows the user to change the connector type and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

    Authentication*

    Select the authentication type from the drop-down list.

    • Github App
    • Personal Access Token

    If you select the GitHub App option, enter the following details:

    • Github App Id: Generated in the GitHub instance > Developer Settings.
    • Github App Pem File Path: Pem file downloaded from the GitHub instance, that is placed in OvalEdge instance.

    If you select the Personal Access Token option, enter the following details:

    • Personal Access Token: Generated in the GitHub instance > Developer Settings.

    Credential Manager*

    The purpose of a credential manager is to enhance the security that stores the API keys, passwords, certificates, and other sensitive data securely and helps to manage access, rotates, and audit secrets.

    OE Credential Manager: GitHub Files connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the GitHub Files database. Users need to add the credentials manually if the OE Credential Manager option is selected.

    HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.

    AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server and fetched from the AWS Secrets Manager to OvalEdge.

    Azure Key Vault: Azure Key Vault is a cloud service provided by Microsoft Azure that allows you to securely store and manage sensitive information such as secrets, encryption keys, and certificates. 

    For more information, click here.

    For more information on Credential Manager, refer to Credential Manager.

    License Add-Ons

    All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a data source.

    Credential Manager ConnId

    When you have more than one Credential Manager ID, pick the specific ID you want in the Credential Manager ConnId field.

    Connector Environment

    The environment drop-down list allows you to select the environment configured for the connector from the drop-down list. For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).

    The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).  

    Note: The steps to set up environment variables are explained in the prerequisite section.

    Connector Name*

    Select a connection name for GitHub Files. You can specify a connection name to identify the GitHub Files connection in OvalEdge.

    Example: GitHubFiles_test

    Github Organization

    Enter the name of the organization.

    Example: ovaledgeindia

    Github Owner

    Enter the owner name of the repository.

    Example: John David

    Repo Name*

    Enter the name of the repository

    Example: ovaledgesuperset

    Personal Access Token*

    Enter the personal access token of the user

    Note: This token is generated by the GitHub owner in the GitHub Instance.

    Example: ghp_z0HvXmn1vnz6wes1naOhEKZ8CYygJo0ixtew

    GitHub Path

    Enter the path of the particular repository folder.

    Example: Test/

    Default Governance Roles*

    Users can select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned for managing the data asset. 

    Note: The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security | Governance Roles section.  

    Admin Roles*

    Select the required admin roles for this connector.

    • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
      The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
    • To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
      The security and Governance Admin is responsible for:
      • Configure role permissions for the connector and its associated data objects.
      • Add admins to set permissions for roles on the connector and its associated data objects.
      • Update governance roles.
      • Create custom fields.
      • Develop Service Request templates for the connector.
      • Create Approval workflows for the templates.

    No of Archive Objects*

    The number of archive objects indicates the number of recent metadata modifications made to a dataset at a remote/source location. By default, the archive objects feature is deactivated. However, users may enable it by clicking the Archive toggle button and specifying the number of objects they wish to archive.  

    Select Bridge*

    A solution is required to circumnavigate the customer firewall when OvalEdge is deployed as a SaaS application. That solution is OvalEdge Bridge. A bridge is a type of firewall that operates at the network layer. 

    • When a bridge has been set up, it will be displayed here in a dropdown menu. Users can select the required Bridge ID.
    • The user can select "NO BRIDGE" when it is not configured.
      For more information, refer to Bridge Overview
  6. After entering all the required connection details explained above, select the appropriate option based on your preferences:
    1. Validate: Click on the Validate button to verify the connection details. This ensures that the provided information is accurate and enables successful connection establishment.
    2. Save: Click on the Save button to store the connection details. Once saved, the connection will be added to the Connectors home page for easy access.
    3. Save & Configure: For certain Connectors that require additional configuration settings, click on the Save & Configure button. This will open the Connection Settings pop-up window, allowing you to configure the necessary settings before saving the connection.
  7. Once the connection is validated and saved, it will be displayed on the Connectors home page.
    Note: You can either save the connection details first, or you can validate the connection first and then save it.

    Connection Validation Errors

    Error Messages

    Description

    Failed to establish a connection, Please check the credentials.

    In case of an invalid repo name, personal access token, etc. 


    Note: If you have any issues creating a connection, please contact your assigned OvalEdge Customer Success Management (CSM) team.

Connector Settings

Once the connection is successfully established, various settings are provided to fetch and analyze the information from the data source.  

The connection settings include Crawler, Access Instruction, Business Glossary Settings, and Anomaly Detection Settings.

To view the Connector Settings page,

  1. Go to the Connectors page.
  2. From the 9- dots select the Settings option.
  3. This will display the Connector Settings page where you can view all the connector setting options.
  4. When you have finished making your desired changes, click on Save Changes. All setting changes will be applied to the metadata.

    The following is a list of connection settings along with their corresponding descriptions:

    Connection Settings

    Description

    Crawler

    Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata.

    Access Instruction

    Access Instruction allows the data owner to instruct others on using the objects in the application.

    Business Glossary Settings

    The Business Glossary Settings provide flexibility and control over how users view and manage term association within the context of a business glossary at the connector level.

    Anomaly Detection Settings

    These settings allow users to set up anomaly detection preferences at the connector level. By default, the configuration is based on global settings established in System Settings and remains unmodifiable.

    In the custom settings, users can activate or deactivate anomaly detection for the specific connector. They also have the flexibility to switch the default algorithm between Deviation or IQR and modify the associated parameters.

Note: For more information, refer to the Connector Settings.

Crawling of Schemas

The Crawl/Profile option allows you to crawl files from a data source and load them into the OvalEdge application. A new job gets initiated after selecting the connector and clicking Crawl/Profile. Once a job has succeeded, the top-level files and their columns are stored in Data Catalog > Files and Data Catalog > File Columns. The next level of data is stored in the File Manager.