Data Warehouse

Google BigQuery

BigQuery stands as Google Cloud's fully managed, cost-effective analytics data warehouse, operating seamlessly at petabyte scale for near-real-time data analytics. By eliminating the need for infrastructure setup, it empowers users to derive meaningful insights effortlessly using standard SQL, offering flexible pricing options.

As a secured cloud-based data warehouse compatible with AWS or Azure infrastructure, BigQuery provides adaptable storage, processing, and data analysis capabilities. 

OvalEdge connects to BigQuery, leveraging the SDK for tasks such as crawling database objects, profiling sample data, executing queries, and establishing lineage.


Connector Capabilities

The connector capabilities are shown below:

Crawling

Feature

Supported Objects

Remarks


Crawling

Schemas

 

Tables

-

Table Columns

Supported Data Types: “BIGINT”, ” CHAR”, ”DATE”, “FLOAT”, “INT”, “NCHAR”, “NTEXT”, “NUMERIC”, “NVARCHAR”, “SMALLINT” , “ TINYINT”, ”VARCHAR”.

Views

-

Stored procedures

 

Functions

 

Triggers

 

Relationships

 

Query logs

 

Profiling

Feature

Support

Remarks

Table Profiling

Row Count, Column Count, Density, Null Density

-

View Profiling

Row Count, Column Count, View Sample Data

View is treated as a table for profiling purposes

Column Profiling

Min, Max, Null Count, Distinct, Top 50 Values

-

Full Profiling 

Supported

-

Sample Profiling

Supported

-


Lineage Building

Feature

Supported Entities

Remarks

Lineage Building

Table Lineage

-

Column Lineage 

-

Lineage Sources

Stored Procedures, Functions, Triggers, Views, SQL Queries (from Query Sheet), Query Logs


Querying

Feature

Operations

Details

Query Execution

Select

Supported

Insert

Not Supported, by default.

Update

Not Supported, by default.

Delete

Not Supported, by default.

Joins within database

Supported

Joins outside database

Not Supported

Aggregations

Supported

Group By

Supported

Order By

Supported


Prerequisites

The following are prerequisites for connecting to the Google BigQuery connector.

Drivers

The APIs/drivers used by the connector are given below:

Driver/API

Version

Details

SDK

1.65.0

https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery/1.65.0


Configuring Environment Variables

Configuring environment names enable you to select the appropriate environment from the drop-down list when adding a connector. This allows for consistent crawling of schemas across different environments, such as production (PROD), staging (STG), or temporary environments. It also facilitates schema comparisons and assists in application upgrades by providing a temporary environment that can be later deleted if needed.

Before establishing a connection, it is important to configure the environment names for the specific connector. If your environments have been configured, skip this step. 

Steps to Configure the Environment

  1. Log into the OvalEdge application.
  2. Navigate to AdministrationSystem Settings.
  3. Select the Connector tab.
  4. Find the key name “connector.environment”.
  5. Enter the desired environment values (PROD, STG) in the Value column.
  6. Click ✔ to Save.

Service Account Permissions

A service account needs to be created for the dataset project with the following permissions to be able to crawl metadata into OvalEdge. Required permissions and corresponding BigQuery commands are given below for your reference:


Operation 

Access Permission

Connection Validation

BigQuery Metadata Viewer 

Crawl Schemas, Tables

BigQuery Job User

Profile Schemas, Tables

BigQuery Data Viewer

Establish a Connection

To connect to the Google BigQuery using the OvalEdge application, complete the following steps:

  1. Log into the OvalEdge application.
  2. Navigate to Administration >  Connectors.
  3. Click on the + (New Connector) icon.
  4. Add Connector pop-up window is displayed where you can search for the Google BigQuery connector.



  5. The Add Connector with Connector Type specific details pop-up window is displayed. Enter the relevant information to configure the Google BigQuery connection.
    Note: The asterisk (*) denotes mandatory fields required for establishing a connection.


    Field Name

    Description

    Connector Type

    It allows you to select the connector from the drop-down list. By default, 'Google BigQuery' is displayed as the selected connector type.

    Credential Manager

    Select the option from the drop-down list to indicate where you want to save your credentials:

    OE Credential Manager: Google BigQuery connection is configured with the basic Username and Password of the service account in real-time when OvalEdge establishes a connection to the Google BigQuery database. Users need to add the credentials manually if the OE Credential Manager option is selected.

    HashiCorp: The credentials are stored in the HashiCorp database server and fetched from HashiCorp to OvalEdge.  

    AWS Secrets Manager: The credentials are stored in the AWS Secrets Manager database server; OvalEdge fetches the credentials from the AWS Secrets Manager. 

    Azure Key Vault: Azure Key Vault allows for secure storage and strict access mechanisms of sensitive information such as tokens, passwords, certificates, API keys, and other confidential data.

    For more information on Azure Key Vault, click here.

    For more information on Credential Manager, click here.

    License Add Ons

    All the connectors will have a Base Connector License by default that allows you to crawl and profile to obtain the metadata and statistical information from a data source. 

    OvalEdge supports various License Add-Ons based on the connector’s functionality requirements.

    • Auto Lineage: Select the Auto Lineage Add-On license that enables the automatic construction of the Lineage of data objects for a connector with the Lineage feature.
    • Data Quality: Select the Data Quality Add-On license to identify, report, and resolve the data quality issues for a connector whose data supports data quality (DQ), using DQ rules/functions, anomaly detection, Reports, and more.  

    Connector Name*

    Enter the name of the connection. The connection name specified in the Connection Name textbox is a reference to the Google BigQuery database connection in the OvalEdge application.

    Example: Google BigQuery Connection

    Billing Project ID

    Specify the Google Cloud project that is billed for the usage of BigQuery resources. Each BigQuery query, job, or operation consumes resources, and the associated costs are billed to a specific Google Cloud project.

    Connector Environment

    The Connector Environment drop-down list allows you to select the environment configured for the connector from the drop-down list. 

    For example, PROD, or STG (based on the configured items in the OvalEdge configuration for the connector.environment).

    The purpose of the environment field is to help you identify which connector is connecting what type of system environment (Production, STG, or QA).

    Note: The steps to set up environment variables are explained in the Configuring Environment Variables section.

    Validation Type

    BigQuery Connector supports two forms of authentication mechanisms. 

    (i) File Authentication

    (ii) UI Authentication

    Project Id*

    Enter the unique identifier for the Google Project associated with BigQuery. 

    Application*

    Enter the dataset ID linked to the above-specified Project Id.

    File Path

    Provide the JSON file path containing service account credentials for secure access to BigQuery. 

    Default Governance Roles*

    The admin will select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned to the data asset. The drop-down list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security > Governance Roles.  

    Admin Roles*

    • To add Integration Admin Roles, search for or select one or more roles from the Integration Admin options, and then click on the Apply button.
      The responsibility of the Integration Admin includes configuring crawling and profiling settings for the connector, as well as deleting connectors, schemas, or data objects.
    • Security and Governance Admins: To add Security and Governance Admin roles, search for or select one or more roles from the list, and then click on the Apply button.
      The security and Governance Admin is responsible for:
      • Configure role permissions for the connector and its associated data objects.
      • Add admins to set permissions for roles on the connector and its associated data objects.
      • Update governance roles.
      • Create custom fields.
      • Develop Service Request templates for the connector.
      • Create Approval workflows for the templates.

    Select Bridge

    With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. The use of a bridge is to provide real-time control that makes it easy to manage the movement of data between any source and any destination.

  6. In the Validation Type field, if the validation type is selected UI Authentication then the following connection fields are displayed.

    Property

    Mandatory/ Optional

    Details

    Account Type

    Mandatory

    Enter the type of account associated with the Google BigQuery data source.


    Ex: Service_Account

    Client ID

    Mandatory

    Enter the Client Id related to the Google BigQuery data source authentication.

    Client Email

    Mandatory

    Enter the Client email related to the service account of the Google BigQuery data source.

    Private Key

    Mandatory

    Enter the key associated with the BigQuery service account that is used to authenticate the Google account.

    Private Key Id

    Mandatory

    Enter the unique id that is used for representing Private Key.

    Token Uri

    Mandatory

    Enter the Google Id token details.

    Project Id

    Mandatory

    Enter the Project Id related to the service account of the Google BigQuery data source.

    Application

    Mandatory

    Enter the Application Id related to the service account of the Google BigQuery data source.

  7. After entering all the connection details, select the appropriate button based on your preferences.
    1. Validate: Click on the Validate button to verify the connection details. This ensures that the provided information is accurate and enables successful connection establishment.
    2. Save: Click on the Save button to store the connection details. Once saved, the connection will be added to the Connectors home page for easy access.
    3. Save & Configure: For certain Connectors that require additional configuration settings. Click on the Save & Configure button. This will open the Connection Settings pop-up window, allowing you to configure the necessary settings before saving the connection.
  8. Once the connection is validated and saved, it will be displayed on the Connectors home page.
    Note: You can either save the connection details first, or you can validate the connection first and then save it.

Connector Settings

Once the connection is established successfully, various settings are provided to fetch and analyze the information from the data source.

The connection settings include Crawler, Profiler, Query Policies, Access Instruction, Business Glossary Settings, and Notification.

To view the Connector Settings page,

  1. Go to the Connectors page.
  2. From the 9- dots select the Settings option.

  3. The Connector Settings page is displayed where you can view all the connector setting options.


  4. Click on Save Changes. All the settings will be applied to the metadata.
  5. The following is a list of connection settings along with their corresponding descriptions:

Connection Settings

Description

Crawler

Crawler settings are configured to connect to a data source and collect and catalog all the data elements in the form of metadata.

Profiler




The process of gathering statistics and informative summaries about the connected data source(s). Statistics can help assess the quality of data sources before using them for analysis. Profiling is always optional; crawling can be run without profiling. 

Query Policies

It restricts the use of the selected query types based on your role.

Access Instruction

Access Instruction allows the data owner to instruct others on using the objects in the application.

Business Glossary Settings

The Business Glossary Setting provides flexibility and control over how they view and manage term association within the context of a business glossary at the connector level.

Notification

The Enable/Disable Metadata Change Notifications option is used to set the change notification about the metadata changes of the data objects.

  • You can use the toggle button to set the Default Governance Roles (Steward, Owner Custodian, etc.) 
  • Using the Roles and Teams, you can select the role and team to receive the notification of metadata changes.


Note: For more information, refer to the Connector Settings.

Crawling of Schema(s)

A Crawl/Profile option allows you to select the specific schemas for the following operations:  Crawl, Crawl & Profile, Profile, or Profile Unprofiled. The defined run date and time are displayed to set for any scheduled crawlers and profilers.

  1. Navigate to the Connectors page, and click the Crawl/Profile button.
    Select Important Schema For Crawling and Profiling pop-up window is displayed.
  2. Select the required Schema(s).
  3. The below list of actions is displayed in the Action section.
    1. Crawl: It allows the crawling of the metadata of the selected schemas.
    2. Crawl & Profile: It allows crawling the metadata of the selected schemas and profiles the sample data.
    3. Profile: It allows the collection of table column statistics.
    4. Profile Unprofiled: It allows the profiling of data that has not been profiled.
    5. Schedule: Connectors can also be scheduled for crawling and/or profiling in advance to run at prescribed times and selected intervals.
      Note: For more information on Scheduling, refer to Scheduling Connector.

Click on the Run button that gathers all metadata from the connected source into the OvalEdge Data Catalog.