BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real-time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using standard SQL and taking advantage of flexible pricing models across on-demand and flat-rate options.
Key features
- BigQuery ML: BigQuery ML enables users to create and execute machine learning models in BigQuery by using standard SQL queries.
- BigQuery BI Engine: BigQuery BI Engine is a fast, in-memory analysis service that allows you to analyze data stored in BigQuery.
- BigQuery Data Transfer Service: The BigQuery Data Transfer Service automates data movement into BigQuery on a scheduled, managed basis. The BigQuery Data Transfer Service facilitates transfers from many data sources, including data warehouses. Transfers can be recurring or occur once.
OvalEdge provides an interface that can connect to Google BigQuery and perform all standard operations. The Google BigQuery data source is integrated via the API driver that supports crawling database objects, profiling sample data, query execution, and building lineage.
Connector Capabilities
The connectivity to the Google BigQuery connector is performed via the API and the versions used by the API are given below:
Driver/API |
Version |
Details |
API |
1.65.0 |
https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery/1.65.0 |
Technical Specifications
Crawling
Feature |
Supported Objects |
Remarks |
Crawling |
Tables |
- |
Table Columns |
Supported Data Types: “BIGINT”, ” CHAR”, ”DATE”, “FLOAT”, “INT”, “NCHAR”, “NTEXT”, “NUMERIC”, “NVARCHAR”, “SMALLINT” , “ TINYINT”, ”VARCHAR”. |
|
Views |
- |
|
Stored procedures |
- | |
Functions |
- | |
Triggers |
- | |
Roles |
- | |
Users |
- | |
Permissions |
- | |
Triggers |
- | |
Usage statistics |
- |
Profiling
Feature |
Support |
Remarks |
Table Profiling |
Row Count, Columns Count, View Sample Data |
- |
View Profiling |
Row Count, Columns Count, View Sample Data |
View is treated as a table for profiling purposes |
Column Profiling |
Min, Max, Null Count, Distinct, Top 50 Values |
- |
Full Profiling |
Supported |
- |
Sample Profiling |
Supported |
- |
Lineage Building
Feature |
Supported Entities |
Remarks |
Lineage Building |
Table Lineage |
- |
Column Lineage |
- |
|
Lineage Sources |
Stored Procedures, Functions, Triggers, Views, SQL Queries (from Query Sheet), Query Logs |
Querying
Feature |
Operations |
Details |
Query Execution |
Select |
Supported |
Insert |
Not Supported, by default. |
|
Update |
Not Supported, by default. |
|
Delete |
Not Supported, by default. |
|
Joins within database |
Supported |
|
Joins outside database |
Not Supported |
|
Aggregations |
Supported |
|
Group By |
Supported |
|
Order By |
Supported |
Connection Details
Pre-requisites
To use the Google BigQuery Connector, the details specified in the following section should be available.
- By default the service account provided for the connector will be used for any query operations. If the service account has write privileges, then Insert/Update/Delete queries can be executed.
- The minimum privileges required are
Operation
Access Permission
Connection Validation
Read access to the Google BigQuery
Crawling
Read access to the Google BigQuery
Profiling
Read access to the Google BigQuery
Lineage Building
Read access to the Google BigQuery
Query Execution
Read access to the Google BigQuery
To connect to the Google BigQuery using the OvalEdge application, complete the following steps.
- Log in to the OvalEdge application
- In the left menu, click on the Administration module name, and the sub-modules associated with the Administration are displayed.
- Click on the Connectors sub-module name, and the Connectors Information page is displayed.
- In the Connectors Information page, click on + New Connection. The Manage Connection with Search Connector pop-up window is displayed.
- In the Manage Connection pop-up window, select the connection type as Google BigQuery. The Manage Connection with Google BigQuery specific details pop-up window is displayed.
- The following are the field attributes required for the connection of Google BigQuery.
Property
Mandatory/ Optional
Details
Connection Type
Mandatory
By default the connection type is displayed as the Google BigQuery, if needed the connection type can be changed by selecting desired connection type from the Connection Type dropdown, and based on the selection of the connection type, the fields associated with the selected connection type are displayed.
License Type
Mandatory
OvalEdge offers two types of licenses based on the customer's requirements. By default, the License type is displayed as ‘Auto Lineage.’ The user can select the license type as Standard or Auto Lineage.
(i) Standard: The standard license has crawler and profiler features and doesn’t have Auto Lineage functionality, where the users are not allowed to build lineage for the selected database and schema-level objects.
(ii) Auto Lineage: The Auto lineage license allows to build a lineage in addition to the Crawling and Profiling feature.
Connection Name
Mandatory
Enter the name of the connection, the connection name specified in the Connection Name textbox will be a reference to the Google BigQuery database connection in the OvalEdge application.
Example: Google BigQuery Connection
Environment
Optional
It allows the user to select the environment configured for the connector from the dropdown list. The configured environments can be Microsoft Azure and Amazon Web Services.
Validation Type
Optional
It allows the user to select the validation configured for the connector from the dropdown list. The configured validation types can be File Authentication and UI Authentication.
Project ID
Mandatory
Enter the Project Id related to the service account of the Google BigQuery data source.
Application
Mandatory
Enter the Application Id related to the service account of the Google BigQuery data source.
File Path
Optional
Enter the service account .json file path which is located in your local environment.
Default Governance Roles
Mandatory
The admin will select a specific user or a team from the governance roles (Steward, Custodian, Owner) that get assigned to the data asset. The dropdown list displays all the configurable roles (single user or a team) as per the configurations made in the OvalEdge Security>Governance Roles.
Select Bridge
Optional
With the OvalEdge Bridge component, any cloud-hosted server can connect with any on-premise or public cloud data sources without modifying firewall rules. The use of a bridge is to provide real-time control that makes it easy to manage the movement of data between any source and any destination.
Note: In the Validation Type field, if the validation type is selected UI Authentication then the following connection fields are displayed.
Property
Mandatory/ Optional
Details
Account Type
Mandatory
Enter the type of account associated with the Google BigQuery data source
Ex: Service_Account
Client ID
Mandatory
Enter the Client Id related to the Google BigQuery data source authentication.
Client Email
Mandatory
Enter the Client email related to the service account of the Google BigQuery data source.
Private Key
Mandatory
Enter the key associated with the BigQuery service account that is used to authenticate the google account.
Private Key Id
Mandatory
Enter the unique id which is used for representing Private Key.
Token Uri
Mandatory
Enter the Google Id token details.
Project Id
Mandatory
Enter the Project Id related to the service account of the Google BigQuery data source.
Application
Mandatory
Enter the Application Id related to the service account of the Google BigQuery data source.
- Once after entering the connection details in the required fields, click on the
button the entered connection details are validated the Save and Save & Configure buttons are enabled.
- Click on the
button to establish the connection or you can also directly click on the
button to establish the connection and configure the connection settings. Here when you click on the Save & Configure button, the Connection Settings pop-up window is displayed. Where you can configure the connection settings for the selected Connector. The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
Error Validations Details
The following are expected errors that can be encountered while establishing the connection.
S.No. |
Error Messages |
Description |
1. |
Project Id Incorrect |
The entered Project Id is not valid. |
2. |
Application Id Incorrect |
The entered Application Id is not valid. |
Crawl/Profile
Once connectivity is established, additional configurations for crawling and profiling can be specified for the selected data source. The user can use the Crawl/Profile option, which allows the user to select the specific schemas that need to be crawled, profiled, or unprofiled. For any scheduled crawlers and profilers, the defined run date and time are displayed to set.
Connector Settings
Once establishing the connection successful the additional configurations for crawling need to be specified. To configure the Crawler settings for the Google BigQuery Connector, select the Google BigQuery Connection Name from the Crawler Information page and click on the 9 dots buttons and select the Settings options. The Connection Settings pop-up window is displayed.
Crawler Settings
Crawler Configurations Settings |
Details |
---|---|
Tables, Views, and Columns |
Select the checkbox to crawl the tables, views, and columns existing in the Google BigQuery data source into the OvalEdge. Note: By default, the checkbox for Tables, Views, and Columns is selected. |
Procedures, Functions, Triggers & Views Source Code |
Select the checkbox to crawl the procedures, functions, triggers & views source code existing in the Google BigQuery data source into the OvalEdge. |
Crawler Rules: Include Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are included for crawling. |
Crawler Rules: Exclude Regex |
Enter the specific schema, table, views, and column names that start with, end with, or have middle characters that are excluded for crawling. |
Profiler Settings
Profiler Configurations Settings |
Details |
Tables and Columns |
Select the checkbox to profile the tables and columns that are existing in the Google BigQuery data source into the OvalEdge. Note: By default, the checkbox for Tables and Columns is selected. |
Views and Columns |
Select the checkbox to profile the views and columns that are existing in the Google BigQuery data source into the OvalEdge. Note: By default, the checkbox for Views and Columns is selected. |
Profile Rules: Include Regex |
Enter the specific table, view, and column names that start with, end with, or have middle characters that are included for profiling. |
Profile Rules: Exclude Regex |
Enter the specific table, view, and column names that start with, end with, or have middle characters that are excluded for profiling. |
To configure the Profile Setting,
Click on the Edit icon that allows the Admin user to configure the profiler setting for the selected data source. There are many attributes you can specify in the profile settings.
The attributes are as follows,
Columns |
Description |
Order |
Order is the sequence in which the profiling is done. |
Day |
Enter the day of the week profiling is set to run. |
Start/End Time |
Enter the start and end time at which profiling is set to perform. |
Number of Threads |
Thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries executed on the data source. |
Profile Type |
There are four main types of data profiling.
|
Row Count Constraint |
The Row Count Constraint option is applicable only when the Profile Type is selected as Auto.
|
Row Count Limit |
Enter the maximum number of rows that should be considered for profiling. |
Sample Profile Size |
Enter the total number of rows to be included in profiling. |
Query Timeout: |
Enter the length of time in seconds to allow the query to run on a remote database before timing out. |
Query Policies
The Query Policies in the Crawler setting provide the right access to the Query sheet functions (Join, Union, SUM, or aggregate functions). You can specify the desired roles and permission to deny the usage of the query sheet function. A role that has been denied policy permission will not have access to those functions in the Query Sheet.
Example: If the user selects the Role as “OE_HRADMIN,” Query Type as “JOIN,” and the Access Type as “DENY,” then the users associated with the OE_HRADMIN privileges are restricted from using the JOIN function in the Query Sheet page.
Access Instruction
Instructions are related information about the data source connection that can be provided here. It could have information about the resources like links, Images, or Videos that help business users to get additional information about a data source connection.
Once you add the access instruction for a specific connection in the crawler settings, after crawling the connection, the saved instruction will appear in the connection hierarchy (Schema, table) on the Data Catalog.
Others
The Send Metadata Changes Notifications option is used to notify data owners or stewards of any metadata changes.
FAQs
- What should I know when upgrading my driver?
The specifications include a few JAR class libraries in the installation package as follows:
JAR
JAR Version
google-cloud-bigquery-1.65.0
1.65.0v
google-http-client-jackson2-1.35.0
1.35.0v
google-auth-library-oauth2-http-0.21.0
0.21.0
- How much does the driver cost?
The google-cloud-bigquery-1.65.0 is available at no additional charge. - Can I use the driver to access BigQuery from a Linux computer?
Yes! You can use the driver to access BigQuery from Linux, Unix, and other non-Windows platforms. For having Bigquery API Authentication on Linux environment.
Copyright © 2022, OvalEdge LLC, Peachtree Corners GA USA