Google BigQuery

BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real-time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using standard SQL and taking advantage of flexible pricing models across on-demand and flat-rate options. 

Key features

  1. BigQuery ML: BigQuery ML enables users to create and execute machine learning models in BigQuery by using standard SQL queries.
  2. BigQuery BI Engine: BigQuery BI Engine is a fast, in-memory analysis service that allows you to analyze data stored in BigQuery.
  3. BigQuery Data Transfer Service: The BigQuery Data Transfer Service automates data movement into BigQuery on a scheduled, managed basis. The BigQuery Data Transfer Service facilitates transfers from many data sources, including data warehouses. Transfers can be recurring or occur once.

OvalEdge provides an interface that can connect to Google BigQuery and perform all standard operations. The Google BigQuery data source is integrated via the API driver that supports crawling database objects, profiling sample data, query execution, and building lineage.

Goggle Big Query

Connector Capabilities

The connectivity to the Google BigQuery connector is performed via the API and the versions used by the API are given below:

Driver/API

Version

Details

API

1.65.0

https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery/1.65.0

Technical Specifications

Crawling

Feature

Supported Objects

Remarks

Crawling

Tables

-

Table Columns

Supported Data Types: “BIGINT”, ” CHAR”, ”DATE”, “FLOAT”, “INT”, “NCHAR”, “NTEXT”, “NUMERIC”, “NVARCHAR”, “SMALLINT” , “ TINYINT”, ”VARCHAR”.

Views

-

Stored procedures

-

Functions

-

Triggers

-

Roles

-

Users

-

Permissions

-

Triggers

-

Usage statistics

-

Profiling

Feature

Support

Remarks

Table Profiling

Row Count, Columns Count, View Sample Data

-

View Profiling

Row Count, Columns Count, View Sample Data

View is treated as a table for profiling purposes

Column Profiling

Min, Max, Null Count, Distinct, Top 50 Values

-

Full Profiling 

Supported

-

Sample Profiling

Supported

-

Lineage Building

Feature

Supported Entities

Remarks

Lineage Building

Table Lineage

-

Column Lineage 

-

Lineage Sources

Stored Procedures, Functions, Triggers, Views, SQL Queries (from Query Sheet), Query Logs

Querying

Feature

Operations

Details

Query Execution

Select

Supported

Insert

Not Supported, by default.

Update

Not Supported, by default.

Delete

Not Supported, by default.

Joins within database

Supported

Joins outside database

Not Supported

Aggregations

Supported

Group By

Supported

Order By

Supported

Connection Details

Pre-requisites

To use the Google BigQuery Connector, the details specified in the following section should be available.

  1. By default the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert/Update/Delete queries can be executed.
  2. The minimum privileges required are

    Operation 

    Access Permission

    Connection Validation

    Read access to the Google BigQuery

    Crawling

    Read access to the Google BigQuery

    Profiling 

    Read access to the Google BigQuery

    Lineage Building

    Read access to the Google BigQuery

    Query Execution

    Read access to the Google BigQuery

To connect to the Google BigQuery using the OvalEdge application, complete the following steps.

  1. Log in to the OvalEdge application
  2. In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
  3. Click on the Connectors sub-module name, and the Connectors Information page is displayed.
  4. In the Connectors Information page, click on + New Connection. The Manage Connection with Search Connector pop-up window is displayed.
  5. In the Manage Connection pop-up window, select the connection type as Google BigQuery. The Manage Connection with  Google BigQuery specific details pop-up window is displayed.
    Goggle Big Query2
  6. The following are the field attributes required for the connection of Google BigQuery.

    Property

    Mandatory/ Optional

    Details

    Connection Type

    Mandatory

    By default the connection type is displayed as the Google BigQuery, if needed the connection type can be changed by selecting desired connection type from the Connection Type dropdown, and based on the selection of the connection type, the fields associated with the selected connection type are displayed.

    License Type

    Mandatory

    OvalEdge offers two types of licenses based on the customer's requirements. By default, the License type is displayed as ‘Auto Lineage.’ The user can select the license type as Standard or Auto Lineage. 

    (i) Standard: The standard license has crawler and profiler features and doesn’t have Auto Lineage functionality, where the users are not allowed to build lineage for the selected database and schema-level objects.

    (ii) Auto Lineage: The Auto lineage license allows to build a lineage in addition to the Crawling and Profiling feature.  

    Connection Name

    Mandatory

    Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Google BigQuery database connection in the OvalEdge application.

    Example: Google BigQuery Connection

    Environment

    Optional

    It allows the user to select the environment configured for the connector from the dropdown list. The configured environments can be Microsoft Azure and Amazon Web Services.

    Validation Type

    Optional

    It allows the user to select the validation configured for the connector from the dropdown list. The configured validation types can be File Authentication and UI Authentication.

    Project ID

    Mandatory

    Enter the Project Id related to the service account of the Google BigQuery data source.

    Application

    Mandatory

    Enter the Application Id related to the service account of the Google BigQuery data source.

    File Path

    Optional

    Enter the service account .json file path which is located in your local environment.

    Default Governance Roles

    Mandatory

    The admin will select a specific user or a  team from the governance roles (Steward, Custodian, Owner) that get assigned to the data asset. The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security>Governance Roles.  

    Select Bridge

    Optional

    With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. The use of a bridge is to provide real-time control that makes it easy to manage the movement of data between any source and any destination.

    Note: In the Validation Type field, if the validation type is selected UI Authentication then the following connection fields are displayed.

    Property

    Mandatory/ Optional

    Details

    Account Type

    Mandatory

    Enter the type of account associated with the Google BigQuery data source 

    Ex: Service_Account

    Client ID

    Mandatory

    Enter the Client Id related to the Google BigQuery data source authentication.

    Client Email

    Mandatory

    Enter the Client email related to the service account of the Google BigQuery data source.

    Private Key

    Mandatory

    Enter the key associated with the BigQuery service account that is used to authenticate the google account.

    Private Key Id

    Mandatory

    Enter the unique id which is used for representing Private Key.

    Token Uri

    Mandatory

    Enter the Google Id token details.

    Project Id

    Mandatory

    Enter the Project Id related to the service account of the Google BigQuery data source.

    Application

    Mandatory

    Enter the Application Id related to the service account of the Google BigQuery data source.

  7. Once after entering the connection details in the required fields, click on the button the entered connection details are validated the Save and Save & Configure buttons are enabled.
  8. Click on the button to establish the connection or you can also directly click on the button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.

      Error Validations Details

      The following are expected errors that can be encountered while establishing the connection.

      S.No.

      Error Messages

      Description

      1.

      Project Id Incorrect

      The entered Project Id is not valid.

      2.

      Application Id Incorrect

      The entered Application Id is not valid.

      Crawl/Profile

      Once connectivity is established, additional configurations for crawling and profiling can be specified for the selected data source. The user can use the Crawl/Profile option, which allows the user to select the specific schemas that need to be crawled, profiled, or unprofiled.  For any scheduled crawlers and profilers, the defined run date and time are displayed to set.

      Connector Settings 

      Once establishing the connection successful the additional configurations for crawling need to be specified. To configure the Crawler settings for the Google BigQuery Connector, select the Google BigQuery Connection Name from the Crawler Information page and click on the 9 dots buttons and select the Settings options. The Connection Settings pop-up window is displayed. 

      Crawler Settings

      Crawler Configurations Settings

      Details

      Tables, Views, and Columns

      Select the checkbox to crawl the tables, views, and columns existing in the Google BigQuery data source into the OvalEdge.

      Note: By default, the checkbox for Tables, Views, and Columns is selected.

      Procedures, Functions, Triggers & Views Source Code

      Select the checkbox to crawl the procedures, functions, triggers & views source code existing in the Google BigQuery data source into the OvalEdge.

      Crawler Rules: Include Regex

      Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling.

      Crawler Rules: Exclude Regex

      Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling.

      Profiler Settings

      Profiler Configurations Settings

      Details

      Tables and Columns

      Select the checkbox to profile the tables and columns that are existing in the Google BigQuery data source into the OvalEdge.

      Note: By default, the checkbox for Tables and Columns is selected.

      Views and Columns

      Select the checkbox to profile the views and columns that are existing in the Google BigQuery data source into the OvalEdge.

      Note: By default, the checkbox for Views and Columns is selected.

      Profile Rules: Include Regex

      Enter the specific table, view, and column names that start with, end with, or have middle characters that are included for profiling.

      Profile Rules: Exclude Regex

      Enter the specific table, view, and column names that start with, end with, or have middle characters that are excluded for profiling.

      To configure the Profile Setting,

      Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings. 

      The attributes are as follows,

      Columns

      Description

      Order

      Order is the sequence in which the profiling is done. 

      Day

      Enter the day of the week profiling is set to run.

      Start/End Time

      Enter the start and end time at which profiling is set to perform.

      Number of Threads

      Thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source. 

      Profile Type

      There are four main types of data profiling. 

      1. Sample
        The profiling is based on a given Sample Profile Size. The data on columns ( statistics such as Min, Max, Distinct, and Null Count) will be different compared to full profiles as it is calculated only on sample size. 
        To execute a sample profile, select the profile type as “Sample” and enter a sample profile size(count of records to be profiled).
      2. Auto:
        1. If the Row Constraint checkbox is selected (Set as True) and if the total Table Row Count (1000)  is more significant than configured Rowcount Limit (100) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
        2. If the Row Constraint checkbox is selected (Set as True) and if the  Table Row Count (100) is less than configured Rowcount Limit (1000)  then all the rows of the table will be profiled without considering the count mentioned in the Rowcount Limit.
          Note:  A profile type set to “Auto” will always depend on the Rowcount limit when the row count constraint must be set as “True”. 
      3. Query:
        1. If the entered table row count is less than the Rowcount Limit, then the profiling is executed on the entire table.
        2. If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues.
      4. Disabled profile type prevents profiling on the selected data source.

      Row Count Constraint

      The Row Count Constraint option is applicable only when the Profile Type is selected as Auto.

      1. If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (100) is less than the total Table Row Count (1000) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
      2. If the Row Constraint checkbox is selected (Set as True) and if the configured Rowcount Limit (1000) is greater than the total Table Row Count (100) then the query is executed without considering the count mentioned in the Rowcount Limit. 

      Row Count Limit

      Enter the maximum number of rows that should be considered for profiling.

      Sample Profile Size

      Enter the total number of rows to be included in profiling.

      Query Timeout:

      Enter the length of time in seconds to allow the query to run on a remote database before timing out. 

      Query Policies

      The Query Policies in the Crawler setting provide the right access to the Query sheet functions (Join, Union, SUM, or aggregate functions). You can specify the desired roles and permission to deny the usage of the query sheet function. A role that has been denied policy permission will not have access to those functions in the Query Sheet.

      Example: If the user selects the Role as “OE_HRADMIN,” Query Type as “JOIN,” and the Access Type as “DENY,” then the users associated with the OE_HRADMIN privileges are restricted from using the JOIN function in the Query Sheet page.

      Access Instruction

      Instructions are related information about the data source connection that can be provided here. It could have information about the resources like links, Images, or Videos that help business users to get additional information about a data source connection. 

      Once you add the access instruction for a specific connection in the crawler settings, after crawling the connection, the saved instruction will appear in the connection hierarchy (Schema, table) on the Data Catalog.

      Others

      The Send Metadata Changes Notifications option is used to notify data owners or stewards of any metadata changes.

      FAQs

      1. What should I know when upgrading my driver?
        The specifications include a few JAR class libraries in the installation package as follows:

        JAR

        JAR Version

        google-cloud-bigquery-1.65.0

        1.65.0v

        google-http-client-jackson2-1.35.0

        1.35.0v

        google-auth-library-oauth2-http-0.21.0

        0.21.0

      2. How much does the driver cost?
        The google-cloud-bigquery-1.65.0  is available at no additional charge.
      3. Can I use the driver to access BigQuery  from a Linux computer?
        Yes! You can use the driver to access BigQuery  from Linux, Unix, and other non-Windows platforms. For having  Bigquery API Authentication on Linux environment.

        Copyright © 2022, OvalEdge LLC, Peachtree Corners GA USA