Knowledge Base

Release6.1 Deep Dive Articles

Data Quality

Data Quality Functions

Data Quality Functions are advanced analytical functions designed to compute and validate numeric and alphanumeric data. OvalEdge provides a rich library of 57 predefined Data Quality Functions that facilitate a wide range of computations for data quality management. These functions can be applied to various data objects, including Tables (TB), Table Columns (TC), Files (FL), File Columns (FC), or SQL queries (SQL).

Let's explore some of the key aspects of OvalEdge's Data Quality Functions with sample examples to enhance understanding:

Supported Data Objects: OvalEdge's Data Quality Functions can be executed on the following data objects:
- Tables (TB): This includes structured data stored in database tables.
- Table Columns (TC): These are individual columns within database tables.
- Files (FL): These are files uploaded or stored in the OvalEdge application.
- File Columns (FC): These represent specific columns within files.
- SQL Queries (SQL): These are custom SQL Queries executed on supported database connectors.
Supported Connectors: OvalEdge supports various connectors to interact with different data sources. The connector support depends on the data object:
- For Table and Table Column related functions, OvalEdge supports connectors such as MySQL, SQL Server, Oracle, PostgreSQL, Snowflake, and more. These connectors enable seamless interaction with the respective databases to perform data quality operations.
- For File and File Column related functions, OvalEdge supports connectors like S3 (Amazon Simple Storage Service), NFS (Network File System), and Azure Data Lake Storage. These connectors enable reading and processing files stored in these storage systems.
Supported File Formats: OvalEdge supports multiple file formats for File and File Column related functions. Some supported file formats include:
- .csv (Comma-Separated Values): This format is commonly used for tabular data storage, where data values are separated by commas.
- .json (JavaScript Object Notation): This format is used for structured data representation in a human-readable format using key-value pairs.
- .parquet: This columnar storage file format is optimized for efficient data processing and analytics.

Example:

Consider a scenario where you have a table named "SalesData" with columns such as "ProductID", "Quantity", and "Price". You want to calculate the total revenue generated by each product. OvalEdge provides a Data Quality Function called "Sum(TC)" that can be applied to the "Price" column of the "SalesData" table.

Function: Sum (TC)
Data Object: Table Column (TC)
Connector: MySQL
Table: SalesData
Column: Price
Input: Success Range: 100 to 200.
Output: The output of the function can be one of the following:
- Success: If the computed value falls within the specified range, the rule is considered successful.
- Failure: If the computed value is outside the specified range, the rule is considered a failure.
- Undetermined: If the function encounters any issues or constraints that prevent it from executing properly, the result is undetermined. For example, if the selected table column is empty and does not contain any data.
- Not Executed: This status indicates that the rule has not been executed.
- Execution Failed: This status indicates that the execution of the rule has failed due to an error or unexpected behavior, such as encountering issues with connecting to the data source.

By executing the "Sum (TC)" function on the "Price" column of the "SalesData" table, OvalEdge will compute the sum of all the prices. This will give you the total revenue generated.

Similarly, you can leverage other Data Quality Functions provided by OvalEdge to perform a wide range of computations and validations on your data, ensuring data quality and accuracy.

OvalEdge provides a comprehensive list of 57 predefined Data Quality Functions categorized based on data object type and default dimension. These functions encompass a wide range of data quality operations, enabling users to perform calculations and validations effectively.

Object	Default Dimension	Data Quality Function
Tables	Completeness	Null Density Percent (TB)
	Uniqueness	Density Percent (TB)
	Validity	Total Row Count (TB)
Table Columns	Completeness	Null Density Percent (TC)
		Empty Or Null Count Percent (TC)
		Not Null And Not Empty Percent (TC)
	Uniqueness	Density Percent (TC)
	Validity	Validate Email Percent (TC)
		Validate Regex Percent (TC)
		Validate Pattern Percent (TC)
		Outlier Rule (TC)
		Is Unique Column (TC)
		Empty Count Range (TC)
		Not Empty Count Range (TC)
		Data Length Range (TC)
		Validate SSN Format (TC)
		Validate Credit Card Format (TC)
		Ends With Symbols (TC)
		Average (TC)
		Distinct (TC)
		Min (TC)
		Max (TC)
		Null Count Range (TC)
		Std Deviation (TC)
		Sum (TC)
		Top Values (TC)
Files	Validity	File Name Contains (FL)
		File Format Validation (FL)
		File Created Date (FL)
		File Size Validation (FL)
		File Row Validation (FL)
File Columns	Validity	All Column Values Matches Given Fixed Value (FC)
		First Letter Of Word Upper Case And Rest Lower Case (FC)
		This Column Value Should Be Unique (FC)
		Word Validation (FC)
		Number Validation (FC)
		Date Validation (FC)
		Numeric Value Between Min And Max Values (FC)
		Value Validation (FC)
		Allow Only Specified Special Character (FC)
		Has Length Of The Value With In Range (FC)
		Columns Unique Percent Value (FC)
		Outliers To Aggregations (FC)
		Has No Null Value (FC)
		Has No Digits or Special Chars (FC)
		Has Digits and Special Chars (FC)
		No Leading or Trailing Spaces (FC)
		Has Substring in String (FC)
		Belongs To One Of Languages (FC)
		Word Count With In Range (FC)
		Column Value Same As Other Column Value (FC)
		Regex Match (FC)
		Column Value Matches One Of Given Values (FC)
Query	Uniqueness	Sql Exact Value (SQL)
	Validity	Custom Sql Exact Value (SP)
		Sql Value Contains (SQL)
		Sql Value Range (SQL)

Data Quality Functions for Tables

Null Density Percent (TB)

Name

Null Density Percent (TB)

Description

The Null Density Percent(TB) function is a data quality function used to calculate the percentage of null values in a table. It determines the count of null values across all table columns and expresses it as a percentage of the total values count.

The purpose of this function is to assess the presence of null values within a table and evaluate whether the null density falls within a specified percentage range. By defining a desired percentage range, you can establish the rule's success criteria.

Formula

NULL DENSITY % = (COUNT OF NULL VALUES IN ALL COLUMNS / (COUNT ALL ROWS X COUNT OF COLUMNS)) * 100

Supported Object Type

The Null Density Percent(TB) function operates on tables. It allows you to evaluate the null density percentage in a table.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the null density percentage should fall. It requires two values, the lower and upper ranges, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Let's consider an example to illustrate the usage of the Null Density Percent(TB) function. We have a table named "LOCATION" as shown below:

COUNTRY	STATE	CITY
USA	VIRGINIA	RICHMOND
CANADA	ONTARIO	null
MEXICO	null	null
INDIA	NEW DELHI	NEW DELHI
UK	LONDON	null
FRANCE	PARIS	null
null	null	null

Suppose we set the input success range to be 0 to 50.

In this case, the NULL values count is 8, and the total count is 21.

To determine the null density percentage, we divide the count of NULL values (8) by the total count (21) and multiply by 100

Null Density Percent is calculated as (8 / 21) * 100 = 38%.

Since the computed value of 38% falls within the specified range of 0 to 50, the rule is considered Successful.

Density Percent (TB)

Name

Density Percent (TB)

Description

The Density Percent(TB) function is a data quality function used to determine the percentage of distinct values in a table. It calculates the sum of distinct values count in each column of a table and returns the value as a percentage of the total values count and expresses it as a percentage.

The purpose of this function is to assess the presence of distinct values within a table and evaluate whether the density falls within a specified percentage range. By setting a desired percentage range, you can define the success criteria for the rule.

Formula

DENSITY % = ((SUM OF COUNT OF DISTINCT VALUES IN EACH COLUMN) / (COUNT OF ALL ROWS X COUNT OF ALL COLUMNS)) * 100

Supported Object Type

The Density Percent(TB) function operates on tables. It allows you to evaluate the density percentage specifically across the columns within a table.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the density percentage should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Let's consider an example to illustrate the usage of the Density Percent(TB) function. We have a table named "LOCATION" shown below:

COUNTRY	REGION/STATE	CITY
USA	NEW YORK	NEW YORK
USA	ARIZONA	TUCSON
USA	NEW YORK	BUFFALO
USA	TEXAS	PARIS
CANADA	ONTARIO	LONDON
INDIA	NEW DELHI	NEW DELHI
UK	LONDON	LONDON
FRANCE	PARIS	PARIS

Suppose we set the input success range to be 0 to 100.

In this case, the Sum of the Count of Distinct values in each column comes to 5 + 7 + 6 = 18

To determine the density percentage, we divide the count of all rows and columns values (8 X 3) by the total count of distinct values (18) and multiply by 100:

Density Percent is calculated as (18 / (8x3)) * 100 = 75%.

Since the computed value of 75% falls within the specified range of 0 to 100, the rule is considered Successful.

Total Row Count (TB)

Name

Total Row Count (TB)

Description

The Total Row Count(TB) function determines the total number of rows in a table. It provides a straightforward way to obtain the count of rows present in the specified table.

The rule's success is determined by checking if the computed value falls within the specified input range. By defining a range of acceptable values, you can establish criteria for successful rule execution.

Formula

There is no specific formula involved in this function. The Total Row Count(TB) function simply counts the number of rows in the table.

Supported Object Type

The Total Row Count(TB) function operates specifically on tables. It allows you to determine the total number of rows in a table.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the total row count value should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

To illustrate the usage of the Total Row Count (TB) function, let's consider a table named "EMPLOYEE" that contains 21 rows.

Table: EMPLOYEE

EmployeeID	Name	Department
101	John	HR
102	Mary	IT
103	Robert	Finance
104	Lisa	Marketing
105	Michael	HR
106	David	IT
107	Emily	IT
108	Olivia	IT
109	Amelia	IT
110	Harper	IT

Suppose we set the input success range to be 10 to 30.

In this case, the computed row count is 10, which falls within the specified range.

Therefore, the rule is Successful.

Data Quality Functions for Table Columns

Null Density Percent (TC)

Name

Null Density Percent (TC)

Description

The Null Density Percent(TC) function is a data quality function used to determine the percentage of null values in a specific column of a table. It calculates the count of null values in the specified column and expresses it as a percentage of the total values in that column.

The purpose of this function is to assess the presence of null values within a specific column. It also evaluates whether the null density falls within a specified percentage range. By defining a desired percentage range, you can establish the rule's success criteria.

Formula

NULL DENSITY (%) IN THE TABLE COLUMN = (NUMBER OF NULL VALUES IN THE TABLE COLUMN / TOTAL NUMBER OF VALUES IN THE TABLE COLUMN) * 100

Supported Object Type

The Null Density Percent(TC) function operates specifically on table columns. It allows you to evaluate the null density percentage in a specific column of a table.

Supported Data Types

The Null Density Percent(TC) function supports a wide range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string, int, bigint, number, long, decimal, double, float, smallint, integer, date, datetime, and timestamp.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the null density percentage should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Let's consider a table column "COUNTRY" with the following values:

COUNTRY	STATE	CITY
USA	VIRGINIA	RICHMOND
CANADA	ONTARIO	OTTAWA
MEXICO	SONORA	HERMOSILLO
INDIA	NEW DELHI	null
null	LONDON	CITY OF LONDON
null	NEW DELHI	NEW DELHI
null	null	null

Suppose we set the input success range to be 0 to 50.

In this case, there are 3 NULL values out of 7 total values in the "COUNTRY" column. Hence, the Null Density Percent is calculated as (3 * 100) / 7 = 42%.

Since the computed null density percentage of 42% falls within the specified range, the rule is considered Successful.

Empty Or Null Count Percent (TC)

Name

Empty Or Null Count Percent (TC)

Description

The Empty Or Null Count Percent(TC) function is a data quality function used to calculate the percentage of empty or null values in a table column. It determines the proportion of empty or null values among the total values in the column and expresses it as a percentage.

The purpose of this function is to assess the presence of empty or null values within a column and evaluate whether the count falls within a specified percentage range. By setting a desired range, you can define the success criteria for the rule.

Formula

EMPTY OR NULL COUNT PERCENT (%) = (COUNT OF EMPTY OR NULL VALUES IN THE COLUMN / TOTAL COUNT OF VALUES IN THE COLUMN) * 100

Supported Object Type

The Empty Or Null Count Percent(TC) function operates on table columns. It allows you to evaluate the percentage of empty or null values in a specific column.

Supported Data Types

This function supports the following data types within the table column:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the empty or null count percentage should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let's consider an example to illustrate the usage of the Empty Or Null Count Percent(TC) function. We have a table column named "COUNTRY" with the following values:

COUNTRY	STATE	CITY
USA	VIRGINIA	RICHMOND
CANADA	ONTARIO	OTTAWA
	SONORA	HERMOSILLO
INDIA	NEW DELHI	NEW DELHI
null	LONDON	CITY OF LONDON
	NEW DELHI	NEW DELHI
null	NEW DELHI	NEW DELHI

In this example, the column has 2 empty values and 2 NULL values out of 7 values.

To calculate the empty or null count percent, we divide the count of empty or null values (4) by the total count of values (7) and multiply by 100

Empty Or Null Count Percent is calculated as (4 / 7) * 100 = 57%.

Suppose we set the input success range for this rule to 0 to 20.

In this case, the computed empty or null count percent is 57%.

Since the computed value of 57% is not within the specified range, the rule is considered a Failure.

Not Null And Not Empty Percent (TC)

Name

Not Null And Not Empty Percent (TC)

Description

The Not Null And Not Empty Percent(TC) function is a data quality function used to calculate the percentage of not empty and non-null values in a table column. It determines the proportion of values that are not empty or null out of the total values in the column and expresses it as a percentage.

The purpose of this function is to assess the presence of not empty and non-null values within a column and evaluate whether the count falls within a specified percentage range. By setting a desired range, you can define the success criteria for the rule.

Formula

NOT NULL AND NOT EMPTY PERCENT (%) = (COUNT OF NOT EMPTY AND NON-NULL VALUES IN THE COLUMN / TOTAL COUNT OF VALUES IN THE COLUMN) * 100

Supported Object Type

The Not Null And Not Empty Percent(TC) function operates on table columns. It allows you to evaluate the percentage of values that are not empty and non-null in a specific column.

Supported Data Types

This function supports the following data types within the table column:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the not null and not empty count percentage should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let's consider an example to illustrate the usage of the Not Null And Not Empty Percent(TC) function. We have a table column named "COUNTRY" with the following values:

COUNTRY	STATE	CITY
USA	VIRGINIA	RICHMOND
CANADA	ONTARIO	OTTAWA
	SONORA	HERMOSILLO
INDIA	NEW DELHI	NEW DELHI
null	LONDON	CITY OF LONDON
	NEW DELHI	NEW DELHI
null	NEW DELHI	NEW DELHI

In this example, the column has 2 empty values and 2 NULL values out of a total of 7 values.

To calculate the not null and not empty count percent, we divide the count of not empty and non-null values (3) by the total count of values (7) and multiply by 100

Not Null And Not Empty Percent is calculated as (3 / 7) * 100 = 43%.

Suppose we set the input success range for this rule to be 80 to 100.

In this case, the computed not null and not empty count percent is 43%.

Since the computed value of 43% is not within the specified range, the rule is considered a Failure.

Density Percent (TC)

Name

Density Percent (TC)

Description

The Density Percent(TC) function is a data quality function used to determine the extent of unique values (or conversely, repeated values) in a table column and represents it as a percentage. It measures the density or uniqueness of data within a column.

The rule is considered successful when the computed percentage falls within the specified input range. This allows you to define the acceptable range for the uniqueness of values in the column.

Formula

DENSITY PERCENT (%) = (COUNT OF DISTINCT VALUES IN THE COLUMN / TOTAL COUNT OF VALUES IN THE COLUMN) * 100

Supported Object Type

The Density Percent(TC) function operates on table columns. It allows you to assess the uniqueness of values within a specific column.

Supported Data Types

This function supports the following data types within the table column:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the density percentage should fall. It requires two values, the lower and upper bounds, expressed as percentages between 0 and 100.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let's consider an example to illustrate the usage of the Density Percent(TC) function. We have a table column named "DEPTNO" with the following values:

EMPLOYEEID	NAME	DEPTNO
1	John Doe	90
2	Jane Doe	50
3	Alex Lee	30
4	Mark Liu	20
5	Lisa Kim	20
6	Eric Wu	90
7	Sarah Tan	50
8	Tom Wong	10
9	Emma Chen	40
10	Jack Lin	60

In this example, the column has 7 distinct values out of a total of 10 values.

To calculate the density percentage, we divide the count of distinct values (7) by the total count of values (10) and multiply by 100

Density Percent is calculated as (7 / 10) * 100 = 70%.

Suppose we set the input success range for this rule to be 50 to 100.

In this case, the computed density percentage is 70%.

Since the computed value of 70 falls within the specified range, the rule is considered Successful.

Validate Email Percent (TC)

Name

Validate Email Percent (TC)

Description

The Validate Email Percent (TC) function is used to evaluate the percentage of valid email values within a specified column. It checks whether the email values conform to the standard pattern of "username@companydomain.com" or "username@companydomain.in".

Formula

Validate Email Percent = (Passed Records Count / Total Records Count) * 100

Supported Object Type

The Validate Email Percent(TC) function operates on table columns, allowing you to assess the validity of email values within a specific column.

Supported Data Types

The Validate Email Percent (TC) function supports a wide range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

Percentage (%) Range: Enter the desired range between 0 and 100 to define the success criteria.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "EMAIL" with the following values:

EMPLOYEEID	NAME	EMAIL
1	John Doe	johndoe@example.com
2	Jane Doe	janedoe@example.com
3	Alex Lee	alexlee@example.com
4	Mark Liu	markliu@example.com
5	Lisa Kim	lisa.kim@example.com
6	Eric Wu	ericwu@example.com
7	Sarah Tan	sarah.tan@example.com
8	Tom Wong	tom.wong@example.com
9	Emma Chen	emma.chen@example.com
10	Jack Lin	jacklinexample.com

Suppose we set the input success range to be 80 to 100.

Let's consider a table column "Email" that contains 10 email addresses. Out of these, 9 email addresses pass the validation test, while 1 email addresses fail the validation.

Validate Email Percent = (Passed Records Count / Total Records Count) * 100 = (9 / 100) * 100 = 90%.

In this case, the computed value of 90% falls within the specified range, indicating a successful email validation.

Validate Regex Percent (TC)

Name

Validate Regex Percent (TC)

Description

The Validate Regex Percent(TC) function is used to validate if the values within a table column match a specified regular expression (regex) pattern, such as ^[A-Za-z0-9]*$. It calculates the percentage of matched values in relation to the total number of values in the column.

The purpose of this function is to determine the success rate of regex validation by comparing the computed percentage with a specified input range.

Formula

VALIDATE REGEX PERCENT = (PASSED OBJECTS COUNT / TOTAL OBJECTS COUNT) * 100

Supported Object Type

The Validate Regex Percent(TC) function operates on table columns, allowing you to validate the values within a specific column against a regex pattern.

Supported Data Types

This function supports the following data types within the table column:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

Percentage (%) Range: Enter the range between 0 and 100 to define the acceptable percentage range.

Pattern: Enter the specific pattern (e.g.,^[A-Za-z0-9]*$) that the values should adhere to for successful validation.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let's consider a table column "NOTES" containing a set of values on which the Validate Regex Percent rule is applied.

EMPLOYEEID	NAME	EMAIL	NOTES
1	John Doe	johndoe@example.com	Hello
2	Jane Doe	janedoe@example.com	HELLO
3	Alex Lee	alexlee@example.com	Hello123
4	Mark Liu	markliu@example.com	889976545
5	Lisa Kim	lisa.kim@example.com	Hello OvalEdge!
6	Eric Wu	ericwu@example.com	Hello 234!
7	Sarah Tan	sarah.tan@example.com	HELLO OvalEdge
8	Tom Wong	tom.wong@example.com
9	Emma Chen	emma.chen@example.com	HELLO-234&
10	Jack Lin	jacklin@example.com	HELLO

The regular expression pattern ^[A-Za-z0-9]*$ can be broken down as follows:

^ - This symbol represents the start of the line. It ensures that the pattern matches from the beginning of the input string.
[A-Za-z0-9] - This character class represents a range of characters. In this case, it includes uppercase letters (A-Z), lowercase letters (a-z), and digits (0-9). This means that any of these characters can appear in the input string.
This quantifier specifies that the preceding character class ([A-Za-z0-9]) can occur zero or more times. It allows for an empty string or any combination of uppercase letters, lowercase letters, and digits.
$ - This symbol represents the end of the line. It ensures that the pattern matches all the way to the end of the input string.

Putting it all together, the pattern ^[A-Za-z0-9]*$ will match any string that consists of zero or more uppercase letters, lowercase letters, or digits. It will not match strings containing other characters or special symbols.

Example: "Hello" or “HELLO” or “Hello123”

This input string consists of only uppercase letters, lowercase letters, and digits.

It matches the pattern because it contains zero or more characters that fall within the range specified by the character class [A-Za-z0-9].

Example: "889976545"

This input string consists of only digits.

It matches the pattern because it contains zero or more characters that fall within the range specified by the character class [A-Za-z0-9].

Example: "Hello OvalEdge!" or “Hello 234!” or “HELLO-234&”

This input string contains a space, dash, and an exclamation mark, which are not included in the character class [A-Za-z0-9].

It does not match the pattern because it fails to meet the requirement of having only uppercase letters, lowercase letters, and digits.

Example: ""

This input string is an empty string.

It matches the pattern because it contains zero characters, which satisfies the condition of zero or more characters specified by the * quantifier.

The Passed Objects Count: 7 (values that match the specified regex pattern)

The Failed Objects Count: 3 (values that do not match the specified regex pattern)

Validate Regex Percent = (Passed Objects Count / Total Objects Count) * 100 = (7 / 10) * 100 = 70%

Suppose we set the input success range to be 50 to 100.

In this case, the computed value of 70% falls within the specified range, indicating Successful validation.

Validate Pattern Percent (TC)

Name

Validate Pattern Percent (TC)

Description

The Validate Pattern Percent(TC) function is a data validation function used to assess the matching of values in a table column to a specified pattern. It calculates the percentage of values that match the given pattern and determines whether it falls within the desired range.

The rule is considered successful when the computed percentage of matching values is within the specified input range.

Formula

VALIDATE PATTERN PERCENT = (PASSED OBJECTS COUNT / TOTAL OBJECTS COUNT) * 100

Supported Object Type

Table Columns: The function can be applied to columns within a table for data validation.

Supported Data Types

The Validate Pattern Percent(TC) function supports a wide range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string, int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

Percentage (%) Range: Enter the range between 0 and 100 to define the acceptable percentage range.

Pattern: Enter the specific pattern (e.g., U, L, D) that the values should adhere to for successful validation.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "NOTES" with the following values:

EMPLOYEEID	NAME	EMAIL	NOTES
1	John Doe	johndoe@example.com	Hello
2	Jane Doe	janedoe@example.com	HELLO
3	Alex Lee	alexlee@example.com	Hello123
4	Mark Liu	markliu@example.com	889976545
5	Lisa Kim	lisa.kim@example.com	Hello OvalEdge!
6	Eric Wu	ericwu@example.com	Hello 234!
7	Sarah Tan	sarah.tan@example.com	HELLO OvalEdge
8	Tom Wong	tom.wong@example.com	HELLO OvalEdge
9	Emma Chen	emma.chen@example.com	HELLO-234&
10	Jack Lin	jacklin@example.com	HELLO OvalEdge

The pattern value "ULD" is not a regular expression pattern itself but a string consisting of three characters: 'U', 'L', and 'D'. Each character represents a specific character class within the pattern. Let's break it down:

U: This character represents the uppercase letter class [A-Z]. It matches any uppercase letter from A to Z.
L: This character represents the lowercase letter class [a-z]. It matches any lowercase letter from a to z.
D: This character represents the digit class [0-9]. It matches any digit from 0 to 9.

When combined, the pattern value "ULD" can be interpreted as a pattern that expects a string consisting of an uppercase letter, followed by a lowercase letter, and ending with a digit.

Example: "Hello" or “HELLO” or “Hello123” or "889976545"

This input string consists of only uppercase letters, lowercase letters, and digits.

It matches the pattern because it contains the characters that fall within the range specified by the character class “ULD”.

Example: "Hello OvalEdge!" or “Hello 234!” or “HELLO-234&”

This input string contains a space, dash, and an exclamation mark, which are not included in the character class “ULD”.

It does not match the pattern because it fails to meet the requirement of having only uppercase letters, lowercase letters, and digits.

The Passed Objects Count: 7 (values that match the specified regex pattern)

The Failed Objects Count: 3 (values that do not match the specified regex pattern)

Validate Pattern Percent = (Passed Objects Count / Total Objects Count) * 100 = (7 / 10) * 100 = 70%

Suppose we set the input success range to be 50 to 100.

In this case, the computed value of 70% falls within the specified range, indicating Successful validation.

Outlier Rule (TC)

Name

Outlier Rule (TC)

Description

The Outlier Rule(TC) function is a data validation function used to identify extreme values within a table column. It aims to determine whether all values in the column fall within a specified input range.

The rule is considered successful when all values in the table column are within the specified range. If any value falls outside the range, the rule is considered failed.

Formula

There is no specific statistical formula for the Outlier Rule(TC) function, as it primarily involves checking whether each value in the column falls within the specified range.

Supported Object Type

The function can be applied to columns within a table for data validation.

Supported Data Types

The Outlier Rule(TC) function supports a range of numeric data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the extreme values should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "DAILY_SALES" with the following values:

PRODUCTID	NAME	DAILY_SALES
1	LCD MONITOR	5000
2	NOTEBOOK	888
3	CALCULATOR	1000
4	USB PORT	57
5	HEADPHONES	25
6	PRINTER	600
7	LED MONITOR	2000

Let's assume that the outlier rule was applied to the "DAILY_SALES" column. The input success range specified is 500 to 10000.

By comparing each value in the column to the specified range, we find that two values, 25 and 57, fall outside the range.

Consequently, the Outlier Rule(TC) function fails for these values.

The rule provides the following information in case of failure:

Result: Failure
Passed Count: 4
Failed Count: 2
Total Count: 7
Failed Values: [57, 25]

Is Unique Column (TC)

Name

Is Unique Column(TC)

Description

The Is Unique Column(TC) function is designed to determine whether all values in a table column are unique.

Its purpose is to check if there are any duplicate values present in the column.

Formula

Not Applicable

Supported Object Type

The Is Unique Column(TC) function can be applied to table columns, allowing you to determine the uniqueness of values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

Not Applicable

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "EMP_ID" with the following values:

EMPLOYEEID	NAME	EMAIL	NOTES
1001	John Doe	johndoe@example.com	Hello
1002	Jane Doe	janedoe@example.com	HELLO
1002	Jane Doe	janedoe@example.com	HELLO
1003	Mark Liu	markliu@example.com	Hello

In this example, the value 1002 is repeated, indicating that the column contains duplicates. Therefore, the Is Unique Column(TC) function fails for this column.

The output for this example would be as follows:

Result: Failure
Passed Count: 2
Failed Count: 2
Total Count: 4
Failed Values: [1002]

Empty Count Range (TC)

Name

Empty Count Range (TC)

Description

The Empty Count Range(TC) function is designed to determine the number of empty values in a table column. Its purpose is to check if the count of empty values falls within a specified input range.

Formula

Not Applicable

Supported Object Type

The Empty Count Range(TC) function supports table columns. It allows you to determine the count of empty values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the count of empty values should fall. You need to specify the lower and upper bounds as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "DAILY_SALES" with the following values:

PRODUCTID	NAME	DAILY_SALES
1	LCD MONITOR	5000
2	NOTEBOOK
3	CALCULATOR	1000
4	USB PORT
5	HEADPHONES
6	PRINTER	600
7	LED MONITOR	2000

Let's set the input success range for the "Empty Count Range(TC)" function to 0 to 3.

By applying the function to the "DAILY_SALES" column, we find that the computed count of empty values is 3, which falls within the specified range.

Therefore, the rule is considered Successful.

The output for this example would be as follows:

Result: Success
Passed Count: 3
Failed Count: 0
Total Count: 10

Not Empty Count Range (TC)

Name

Not Empty Count Range (TC)

Description

The Not Empty Count Range(TC) function is designed to determine the number of non-empty values in a table column. Its purpose is to check if the count of non-empty values falls within a specified input range.

Formula

Not Applicable

Supported Object Type

The Not Empty Count Range(TC) function can be applied to table columns, allowing you to determine the count of non-empty values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the count of not empty values should fall. You need to specify the lower and upper bounds as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "DAILY_SALES" with the following values:

PRODUCTID	NAME	DAILY_SALES
1	LCD MONITOR	5000
2	NOTEBOOK
3	CALCULATOR	1000
4	USB PORT
5	HEADPHONES
6	PRINTER	600
7	LED MONITOR	2000

Let's set the input success range for the "Not Empty Count Range (TC)" function to 10 to 10000.

By applying the function to the "DAILY_SALES" column, we find that the computed count of non-empty values is 4, which is not within the specified range of 10 to 1000. Therefore, the Not Empty Count Range(TC) function fails for this column.

The output for this example would be as follows:

Result: Failure
Passed Count: 4
Failed Count: 0
Total Count: 7

Data Length Range (TC)

Name

Data Length Range (TC)

Description

The Data Length Range (TC) function is designed to determine the length of values in a table column. Its purpose is to check if the computed value falls within a specified input range.

Formula

Not Applicable

Supported Object Type

The Data Length Range (TC) function supports table columns. It allows you to determine the length of values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the length of values should fall. You need to specify the lower and upper bounds as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "NOTES" with the following string values:

EMPLOYEEID	NAME	EMAIL	NOTES
1001	John Doe	johndoe@example.com	Hello OvalEdge
1002	Jane Doe	janedoe@example.com	HELLO It's Jane - Welcome to OvalEdge
1002	Jane Doe	janedoe@example.com	HELLO It's Jane - Welcome to OvalEdge
1003	Mark Liu	markliu@example.com	Hello OvalEdge

Let's set the input success range for the "Data Length Range (TC)" function to 10 to 50.

By applying the function to the "NOTES" column, the function computes the length of each value in the column and compares it against the specified input success range.

For each value in the "NOTES" column, we calculate its length:

“Hello OvalEdge” has a length of 14 characters
“HELLO It's Jane - Welcome to OvalEdge” has a total of 31 characters, including spaces and hyphens.
“HELLO It's Jane - Welcome to OvalEdge” has a total of 31 characters, including spaces and hyphens.
“Hello OvalEdge” has a length of 14 characters

Since all the values in the "NOTES" column have lengths within the specified range of 10 to 50 characters, the Data Length Range(TC) function is Successful for this column.

Validate SSN Format (TC)

Name

Validate SSN Format (TC)

Description

The Validate SSN Format (TC) function is used to validate the format of Social Security Numbers (SSN) in a table column. It checks if all values in the column match the pattern "XXX-XX-XXXX", where X represents a digit, and converts the matched values to a percentage.

Formula

Not Applicable

Supported Object Type

The Validate SSN Format (TC) function can be applied to table columns, specifically those containing SSN values.

Supported Data Types

The function supports a wide range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, and string.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the percentage of SSN values should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "SSN" with the following values:

EMPLOYEEID	NAME	EMAIL	SSN
1001	John Doe	johndoe@example.com	123-45-6789
1002	Jane Doe	janedoe@example.com	987654321
1003	David Warner	davidw@example.com	23456-7891
1004	Mark Liu	markliu@example.com	34-567-8912

Suppose we set the input success range to be 80 to 100.

Let's apply the Validate SSN Format(TC) function to the "SSN" column. By validating the format of each SSN value, we observe the following:

“123-45-6789” matches with pattern and it is considered as Passed
“987654321” does not match with the pattern and it is considered as Failed
“23456-7891” does not match with the pattern and it is considered as Failed
“34-567-8912” does not match with the pattern and it is considered as Failed

The output for this example would be as follows:

Result: Failure
Passed Count: 1
Failed Count: 3
Total Count: 4
Failed Values: [987654321, 23456-7891, 34-567-8912]

In this example, one out of the four SSN values match the specified format, resulting in a success percentage of 25%.

Percentage = (Passed Count / Total Count) * 100 = 1/4 * 100 = 25%

Since the success percentage falls outside the specified range of 80 to 100, the rule is considered a Failure.

Validate Credit Card Format (TC)

Name

Validate Credit Card Format (TC)

Description

The Validate Credit Card Format (TC) function is used to validate the format of values in a table column based on a specific pattern. It checks if the values match the pattern "XXXX XXXX XXXX XXXX" (e.g., four sets of four digits separated by spaces), and then calculates the percentage of values that match the format.

Formula

Not Applicable

Supported Object Type

The Validate Credit Card Format (TC) function can be applied to table columns, allowing you to validate the format of credit card numbers within a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, and string.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the percentage of Credit Card values should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "CREDITCARD" with the following values:

EMPLOYEEID	NAME	EMAIL	CREDIT CARD
1001	John Doe	johndoe@example.com	1234 5678 9012 3456
1002	Jane Doe	janedoe@example.com	12345 67890 98765
1003	David Warner	davidw@example.com	1456 7654 8978 2201
1004	Mark Liu	markliu@example.com	23454 78763 786346

Suppose we set the input success range to be 80 to 100.

Let's apply the Validate Credit Card Format (TC)function to the "CREDITCARD" column. By validating the format of each CREDIT CARD value, we observe the following:

“1234 5678 9012 3456” matches with pattern and it is considered as Passed
“12345 67890 98765” does not match with the pattern and it is considered as Failed
“1456 7654 8978 2201” matches with pattern and it is considered as Passed
“23454 78763 786346” does not match with the pattern and it is considered as Failed

The output for this example would be as follows:

Result: Failure
Passed Count: 2
Failed Count: 2
Total Count: 4
Failed Values: [12345 67890 98765, 23454 78763 786346]

In this example, two out of the four Credit Card values match the specified format, resulting in a success percentage of 50%.

Percentage = (Passed Count / Total Count) * 100 = 2/4 * 100 = 50%

Since the success percentage falls outside the specified range of 80 to 100, the rule is considered a Failure.

Ends With Symbol (TC)

Name

Ends With Symbol (TC)

Description

The Ends With Symbol (TC) function is used to validate whether all the values in a table column end with a specified symbol. It checks if every value in the column satisfies this condition.

Formula

The formula for this function is not applicable, as it does not involve any numerical calculations or percentages.

Supported Object Type

The Ends With Symbol (TC) function can be applied to table columns, allowing you to validate the ending character of values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, decimal, varchar, double, float, String, and smallint.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

The symbol entered in the "Success Value" field, represents the desired ending character (eg. @)

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "EMAIL" with the following values:

EMPLOYEEID	NAME	EMAIL
1001	John Doe	johndoe@example.com
1002	Jane Doe	janedoe@example.com
1003	David Warner	davidw@example.com
1004	Mark Liu	markliu@example.com

Suppose we set the input success range to be 80 to 100.

Let's apply the Ends With Symbol(TC) function to the "EMAIL" column with the symbol "@" as the desired ending character. By checking the ending character of each value, we observe the following:

“johndoe@” matches with the condition, and it is considered as Passed
“janedoe@” matches with the condition, and it is considered as Passed
“davidw@” matches with the condition, and it is considered as Passed
“markliu@” matches with the condition, and it is considered as Passed

The output for this example would be as follows:

Result: Successful
Passed Count: 4
Failed Count: 0
Total Count: 4
Failed Values: [ ]

Since all the values in a table column end with a specified symbol, the rule is considered as Successful.

Average (TC)

Name

Average (TC)

Description

The Average (TC) function is used to determine the average of all the values in a table column. It calculates the mean value by summing up all the values in the column and dividing it by the total count of values.

Formula

Average = Sum of all values / Total count of values

Supported Object Type

The Average (TC) function can be applied to table columns, allowing you to calculate the average of numeric values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the average value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "SCORE" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	60
1002	Jane Doe	janedoe@example.com	20
1003	David Warner	davidw@example.com	50
1004	Mark Liu	markliu@example.com	80

Suppose we set the input success range to be 50 to 100.

Let's apply the Average (TC) function to the "SCORE" column. By calculating the average of all the values, we can determine if it falls within the specified range. In this example, the average is computed as follows:

Average = (60 + 20 + 50 + 80) / 10 = 210 / 4 = 52

Since the computed average value of 52 falls within the specified success range of 50 to 100, the rule is considered Successful.

The output for this example would be as follows:

Result: Successful
Average: 50

Distinct (TC)

Name

Distinct (TC)

Description

The Distinct (TC) function can be applied to table columns, allowing you to count the number of unique values within a specific column of a dataset.

Formula

The statistical formula for calculating the distinct count is simply counting the number of unique values in the column.

Supported Object Type

The Distinct (TC) function can be applied to table columns, allowing you to count the number of unique values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the count of distinct values should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "SALARY" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SALARY
1001	John Doe	johndoe@example.com	$60000
1002	Jane Doe	janedoe@example.com	$20000
1003	David Warner	davidw@example.com	$20000
1004	Mark Liu	markliu@example.com	$80000

Suppose we set the input success range to be 5 to 10.

Let's apply the Distinct(TC) function to the "SALARY" column. By counting the distinct or unique values, we can determine if it falls within the specified range. In this example, there are 3 unique values in the "SALARY" column:

Distinct Count = 3

Since the computed distinct count value of 3 falls within the specified success range of 5 to 10, the rule is considered Successful.

The output for this example would be as follows:

Result: Successful
Distinct Count: 3

Min (TC)

Name

Min (TC)

Description

The Min (TC) function is used to determine the minimum value of a table column. It calculates the smallest value present in the column.

Formula

The statistical formula for calculating the minimum value is finding the smallest value in the column.

Supported Object Type

The Min (TC) function can be applied to table columns, allowing you to find the minimum value within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the minimum value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "DAILY_SALES" with the following values:

PRODUCTID	NAME	DAILY_SALES
1	LCD MONITOR	5000
2	NOTEBOOK	400
3	CALCULATOR	1000
4	USB PORT	250
5	HEADPHONES	700
6	PRINTER	600
7	LED MONITOR	2000

Suppose we set the input success range to 500 to 10000.

Let's apply the Min (TC) function to the "DAILY_SALES" column. By finding the minimum value, we can determine if it falls within the specified range. In this example, the minimum value in the "DAILY_SALES" column is 250.

Minimum Value = 250

Since the computed minimum value of 250 is outside the specified success range of 500 to 10000, the rule is considered Successful.

The output for this example would be as follows:

Result: Successful
Minimum Value: 25

Max (TC)

Name

Max (TC)

Description

The Max (TC) function is used to determine the maximum value in a table column (TC). It calculates the largest value present in the column.

Formula

The statistical formula for calculating the maximum value is finding the largest value in the column.

Supported Object Type

The Max (TC) function can be applied to table columns, allowing you to find the maximum value within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the maximum value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "DAILY_SALES" with the following values:

PRODUCTID	NAME	DAILY_SALES
1	LCD MONITOR	5000
2	NOTEBOOK	400
3	CALCULATOR	1000
4	USB PORT	250
5	HEADPHONES	700
6	PRINTER	600
7	LED MONITOR	2000

Suppose we set the input success range to 500 to 10000.

Let's apply the Max (TC) function to this table column. By finding the maximum value, we can determine if it falls within the specified range. In this example, the maximum value in the column is 5000.

Maximum Value = 5000

Since the computed maximum value of 5000 falls within the specified success range of 500 to 10000, the rule is considered Successful.

The output for this example would be as follows:

Result: Successful
Maximum Value: 5000

Null Count Range (TC)

Name

Null Count Range (TC)

Description

The Null Count Range (TC) function is used to determine the count of null values in a table column (TC). It calculates the number of null values present in the column.

Formula

The statistical formula for calculating the null count is simply counting the number of null values in the column.

Supported Object Type

The Null Count Range(TC) function can be applied to table columns, allowing you to find the count of null values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the null count value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a table column named "EMPLOYEEID" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SALARY
1001	John Doe	johndoe@example.com	$60000
	Jane Doe	janedoe@example.com	$20000
1003	David Warner	davidw@example.com	$20000
	Mark Liu	markliu@example.com	$80000

Suppose we set the input success range to 0 to 10.

Let's apply the Null Count Range(TC) function to this table column. By counting the null values, we can determine if the count falls within the specified range. In this example, the null count in the column is 3.

Null Count = 3

Since the computed null count of 3 falls within the specified success range of 0 to 10, the rule is considered Successful.

The output for this example would be as follows:

Result: Success
Null Count: 3

Std Deviation (TC)

Name

Std Deviation (TC)

Description

The Std Deviation (TC) function is used to determine the standard deviation of the values in a table column (TC). The standard deviation measures the variation or spread of data around the mean.

A low standard deviation indicates that the data points are closely clustered around the average, while a high standard deviation indicates a larger spread between the data points and the average.

Formula

The formula for calculating the standard deviation is as follows:

Standard Deviation = sqrt( sum( (x - mean)^2 ) / N )

Where:

x represents each individual value in the column
mean represents the average of the values in the column
N represents the total count of values in the column

Supported Object Type

The Std Deviation(TC) function can be applied to table columns, allowing you to calculate the standard deviation of a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the standard deviation value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "SCORE" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	600
1002	Jane Doe	janedoe@example.com	200
1003	David Warner	davidw@example.com	200
1004	Mark Liu	markliu@example.com	800

Suppose we set the input success range to 0 to 500.

Let's apply the Std Deviation (TC) function to this table column. By calculating the standard deviation of the values, we can determine if it falls within the specified range. In this example, the average of the dataset is 450, and the standard deviation is approximately 260.

Standard Deviation ≈ 260

Let's assume the input success range specified is 0 to 500. Since the computed standard deviation of 260 falls within the specified range, the rule is considered Successful.

The output for this example would be as follows:

Result: Success
Standard Deviation: 260

Calculation:

To calculate the standard deviation, we follow these steps:

Calculate the mean (average) of the values:

Mean = (600 + 200 + 200 + 800) / 4 = 450

Subtract the mean from each value and square the result:

(600 - 450)^2 = 150^2 = 150 * 150 = 22,500
(200 - 450)^2 = (-250)^2 = 250 * 250 = 62,500
(200 - 450)^2 = (-250)^2 = 250 * 250 = 62,500
(800 - 450)^2 = 350^2 = 350 *350 = 122,500

Calculate the sum of the squared differences:

Sum = 22,500 + 62,500 + 62,500 + 122,500 = 270,000

Divide the sum by the total count of values (N):

Standard Deviation = sqrt(Sum / N)

Note: sqrt indicates Square Root.

In this case, N = 4 (total count of values), so:

Standard Deviation = sqrt(270,000 / 4) = sqrt(67,500) ≈ 259.807

Therefore, the standard deviation of the "SCORE" column is approximately 260.

In this example, the calculated standard deviation of approximately 260 indicates a relatively large spread in score values, as the values deviate significantly from the mean of $450.

Sum (TC)

Name

Sum (TC)

Description

The Sum (TC) function is a rule that calculates the sum of the values in a column. It is commonly used in data analysis and validation processes to assess the total numerical value within a dataset.

Formula

The formula for the Sum(TC) function is as follows:

Sum = Σ (values)

where Σ represents the summation symbol, and "values" refers to the numeric values in the table column.

Supported Object Type

The Sum (TC) function can be applied to table columns, allowing you to calculate the sum of values within a specific column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the sum value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "SCORE" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	20
1002	Jane Doe	janedoe@example.com	40
1003	David Warner	davidw@example.com	30
1004	Mark Liu	markliu@example.com	25
1005	Mike Tyson	mtyson@example.com	35

Suppose we set the input success range to 100 to 200.

Let's consider a table column called "SCORE" with the following values: [20, 40, 30, 25, 35].

We will apply the Sum (TC) function to calculate the sum of these values.

Sum = 20 + 40 + 30 + 25 + 35 = 150

As the input success range specified is 100 to 200, and the computed sum is 150, the rule will be considered Successful since the sum falls within the specified range.

The output for this example would be as follows:

Result: Success
Sum: 150

Top Values (TC)

Name

Top Values (TC)

Description

The Top Values (TC) function is used to determine the top values in a column. It evaluates whether all the specified values exist within the computed top 50 values of the column. This function helps in assessing if certain values are among the highest in the dataset and can be used for validation or comparison purposes.

Formula

The Top Values (TC) function compares the specified values with the computed top 50 values of the column to check for their presence.

Supported Object Type

The Top Values (TC) function can be applied to table columns, allowing you to check if specific values are among the top values in a column of a dataset.

Supported Data Types

The function supports a wide range of data types that includes:

Supported Connectors

RDBMS: MySQL, SQL Server, Oracle, PostgreSQL, Snowflake and more.

Input

This input allows you to define the desired range within which the top value should fall. You need to specify the lower and upper ranges as numeric values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a table column named "SCORE" with the following numeric values:

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	100
1002	Jane Doe	janedoe@example.com	89
1003	David Warner	davidw@example.com	78
1004	Mark Liu	markliu@example.com	94
1005	Mike Tyson	mtyson@example.com	99

Suppose we set the input success range to top value-98, top value-45, top value-78

Consider a table column named "SCORE" with the following top values: [100, 89, 78, 94, 99].

Let's assume we want to check if the values [98, 45, 78] exist among the top values of the column.

By comparing the specified values with the computed top 50 values, we observe the following:

Specified values: [98, 45, 78]

Computed top values: [100, 99, 94, 89, 78]

In this example, the specified values 98 and 45 are not present among the top values of the column. Therefore, the Top Values (TC) rule fails.

The output for this example would be as follows:

Result: Failure

Data Quality Functions for Files

File Name Contains (FL)

Name	File Name Contains (FL)
Description	The File Name Contains (FL) function is used to determine if the specified input values exist in the file name. It checks whether all the specified values are present within the file name, allowing you to validate the presence of specific strings or patterns.
Formula	The File Name Contains (FL) function searches for the presence of the specified input values within the file name.
Supported Object Type	The File Name Contains (FL) function is applicable to files. It allows you to check if specific values are present within the file name.
Supported Connectors	Files: S3, NFS, and Azure Data Lake.
Input	The File Name Contains (FL) function requires the following inputs: Alphanumeric Value: Enter one or more alphanumeric strings separated by commas. These strings represent the values you want to check for in the file name. Eg. filenamecontains-profile, filenamecontains-sheet
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed.
Example
Suppose we set the input value as filenamecontains-profile, filenamecontains-sheet Let's apply the File Name Contains (FL) function and check if the file names contain the strings "profile" and "sheet". File name: "Emp_profile.csv" Input strings: "profile", "sheet" In this example, the input string "profile" is found within the file name. Therefore, the File Name Contains(FL) rule is Successful. The output for this example would be as follows: Result: Successful

File Format Validation (FL)

Name	File Format Validation (FL)
Description	The File Format Validation(FL) function is used to validate the format of a file uploaded into the OvalEdge application. It ensures that the file format matches the specified format requirements. Currently, the supported file formats are "csv", "json" and “parquet”.
Formula	The File Format Validation (FL) function checks if the file format of the uploaded file matches the specified format.
Supported Object Type	The File Format Validation (FL) function is applicable to files uploaded into the OvalEdge application. It allows you to validate the format of the uploaded files.
Supported Connectors	Files: S3, NFS, and Azure Data Lake.
Input	Enter the desired file format that the uploaded file should adhere to. Eg. .csv, .json
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed.
Example
Suppose we set the input value as .csv, .json Let's consider a scenario where a user uploads a file named "test_profile.csv" into the OvalEdge application. The desired file format for this case is ".csv". Uploaded file: "test_profile.csv" Desired file format: ".csv" By comparing the file format of the uploaded file with the specified format, we can determine if the rule is successful: In this example, the file format of the uploaded file "test_profile.csv" matches the specified format ".csv". Therefore, the File Format Validation (FL) rule is Successful. The output for this example would be as follows: Result: Successful

File Created Date (FL)

Name	File Created Date (FL)
Description	The File Created Date (FL) function is used to validate if the creation date of a file falls within a specified date range. This function helps determine whether the file was profiled or cataloged within the given timeframe.
Formula	The File Created Date (FL) function compares the creation date of the file with the specified lower and upper date range to check if it falls within that range.
Supported Object Type	The File Created Date (FL) function can be applied to files, allowing you to validate the creation date of a specific file.
Supported Connectors	Files: S3, NFS, and Azure Data Lake.
Input	The File Created Date (FL) function requires the following inputs: Parameter 1: Lower date range - This parameter specifies the start date and time of the range. Parameter 2: Upper date range - This parameter specifies the end date and time of the range.
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed.
Example
Consider a file with the following details: File Name: "test_data.csv" Creation Date: 2022-11-25 02:00:00 pm Let's assume we apply the File Created Date(FL) function to this file and specify the date range as follows: Parameter 1 (Lower date range): 2022-11-25 12:00:00 am Parameter 2 (Upper date range): 2022-11-17 04:55:28 pm By comparing the creation date of the file with the specified date range, we observe that the creation date, 2022-11-25 02:00:00 pm, falls within the range. Therefore, the rule is considered Successful. The output for this example would be as follows: Result: Successful

File Size Validation (FL)

Name	File Size Validation (FL)
Description	The File Size Validation (FL) function is used to validate the size of a selected file. This function helps determine whether the file size falls within a specified range.
Formula	The File Size Validation (FL) function compares the size of the file with the specified lower and upper range to check if it is within that range.
Supported Object Type	The File Size Validation (FL) function can be applied to files, allowing you to validate the size of a specific file.
Supported Connectors	Files: S3, NFS, and Azure Data Lake.
Input	The File Size Validation (FL) function requires the following inputs: Enter the Numeric values in the range: This input specifies the lower and upper range for the file size in bytes.
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed.
Example
Consider a file with the following details: File Name: "sample_data.csv" File Size: 1.03 MB Let's assume we apply the File Size Validation(FL) function to this file and specify the size range as follows: Enter the Numeric values in the range: 1000000 to 1100000 (1 megabytes (MB) to 1.1 megabytes (MB)) By comparing the size of the file with the specified range, we observe that the file size, 1.03 MB, falls within the range. Therefore, the rule is considered Successful. The output for this example would be as follows: Result: Successful

File Row Validation (FL)

Name

File Row Validation (FL)

Description

The File Row Validation (FL) function is used to validate the number of rows in a selected file. This function helps determine whether the total row count falls within a specified range.

Formula

The File Row Validation (FL) function compares the total row count of the file with the specified lower and upper range to check if it is within that range.

Supported Object Type

The File Row Validation (FL) function can be applied to files, allowing you to validate the number of rows in a specific f

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The File Row Validation (FL) function requires the following inputs:

Enter the Numeric values in the range: This input specifies the lower and upper range for the total row count.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

Example

Consider a file with the following details:

The file name is "EMPLOYEE.csv" and contains the data specified below.

EmployeeID	Name	Department
101	John	HR
102	Mary	IT
103	Robert	Finance
104	Lisa	Marketing
105	Michael	HR
106	David	IT
107	Emily	IT
108	Olivia	IT
109	Amelia	IT
110	Harper	IT

Total Row Count: 10

Let's assume we apply the File Row Validation (FL) function to this file and specify the row count range as follows:

Enter the Numeric values in the range: 10 to 30

By comparing the total row count of the file with the specified range, we observe that the row count, 10, falls within the range. Therefore, the rule is considered Successful.

The output for this example would be as follows:

Result: Successful

Data Quality Functions for File Columns

All Column Values Matches Given Fixed Value (FC)

Name

All Column Values Matches Given Fixed Value (FC)

Description

The All Column Values Matches Given Fixed Value (FC) function is used to check if all the values in a file column match a specified fixed value. It evaluates each value in the column and compares it to the provided fixed value to determine if the condition is met.

Formula

Not Applicable

Supported Object Type

The All Column Values Matches Given Fixed Value (FC) function can be applied to file columns, allowing you to check if the values in a specific column of a file match a fixed value.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The All Column Values Matches Given Fixed Value (FC) function requires the following inputs:

Fixed Value: Specify the value that all the values in the file column should match. This value is provided as a parameter in the function configuration.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let consider a file column “Name” with the following values:

File Name: "EMPLOYEE.csv"

EmployeeID	Name	Department
101	Johnson	HR
102	Robert	IT
103	Amelia	Finance
104	Lisa	Marketing
105	Johnston	HR
106	David	IT
107	Johnathan	IT
108	Olivia	IT
109	Williamjohn	IT
110	Harper	IT

Let's assume that we apply the All Column Values Matches Given Fixed Value (FC) function to the "Name" column and specify the fixed value as "Johnson".

By comparing each value in the column to the specified fixed value, we observe the following:

Johnson: Passed
Robert: Failed
Amelia: Failed
Lisa: Failed
Johnston: Failed
David: Failed
Johnathan: Failed
Olivia: Failed
Williamjohn: Failed
Harper: Failed

In this example, the values “Robert”, “Amelia”, “Lisa”, “Johnston”, “David”, “Johnathan”, “Olivia”, “Williamjohn”, and “Harper” do not match the specified fixed value of "Johnson". Therefore, the All Column Values Matches Given Fixed Value (FC) rule fails for these values.

The output for this example would be as follows:

Result: Failure
Passed Count: 1
Failed Count: 9
Total Count: 10
Failed Values: [Robert, Amelia, Lisa, Johnston, David, Johnathan, Olivia, Williamjohn, Harper]

First Letter Of Word Upper Case And Rest Lower Case (FC)

Name

First Letter Of Word Upper Case And Rest Lower Case (FC)

Description

The First Letter Of Word Upper Case And Rest Lower Case (FC) function is used to check if every word in a file column has its first letter in uppercase and the rest of the letters in lowercase. It evaluates each value in the column and verifies if the condition is met.

Formula

Not Applicable

Supported Object Type

The First Letter Of Word Upper Case And Rest Lower Case (FC) function can be applied to file columns. It allows you to check if the values in a specific column of a file have their first letter in uppercase and the rest of the letters in lowercase.

Supported Data Types

The function supports a range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, and string.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The First Letter Of Word Upper Case And Rest Lower Case (FC) function does not require any specific inputs. It operates solely based on the values in the file column.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let consider a file column “Name” with the following values:

File Name: "EMPLOYEE.csv"

EmployeeID	Name	Department
101	Johnson	HR
102	robertW	IT
103	ameliaF	Finance
104	Lisa	Marketing
105	Johnston	HR

Let's apply the First Letter Of Word Upper Case And Rest Lower Case (FC) function to the "Name" column.

By evaluating each value in the column, we observe the following:

Johnson: Passed
robertW: Failed
ameliaF: Failed
Lisa: Passed
Johnston: Passed

In this example, the values "robertW" and "ameliaF" do not have their first letter in uppercase and the rest of the letters in lowercase. Therefore, the First Letter Of Word Upper Case And Rest Lower Case (FC) rule fails for these values.

The output for this example would be as follows:

Result: Failure
Passed Count: 3
Failed Count: 2
Total Count: 5
Failed Values: [robertW, ameliaF]

This Column Value Should Be Unique (FC)

Name

This Column Value Should Be Unique (FC)

Description

The This Column Value Should Be Unique(FC) function is used to check if all the values in a file column are unique, meaning there are no repeated values. It evaluates each value in the column and verifies if the condition is met.

Formula

Not Applicable

Supported Object Type

The This Column Value Should Be Unique (FC) function can be applied to file columns. It allows you to check if the values in a specific column of a file are unique, with no repeated values.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The This Column Value Should Be Unique (FC) function does not require any specific inputs. It operates solely based on the values in the file column.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let consider a file column “Name” with the following values:

File Name: "EMPLOYEE.csv"

EmployeeID	Name	Department
101	Johnson	HR
102	Robert	IT
103	Amelia	Finance
104	Lisa	Marketing
105	Johnson	HR

Let's apply the This Column Value Should Be Unique (FC) function to the "Name" column.

By evaluating each value in the column, we observe the following:

Johnson: Passed
Robert: Passed
Amelia: Passed
Lisa: Passed
Johnson: Failed

In this example, the value "Johnson" is repeated 2 times, violating the uniqueness condition. Therefore, the This Column Value Should Be Unique (FC) rule fails for this value.

The output for this example would be as follows:

Result: Failure
Passed Count: 4
Failed Count: 1
Total Count: 5
Failed Values: [Johnson]

Word Validation (FC)

Name

Word Validation (FC)

Description

The Word Validation(FC) function is used to check if the total number of spaces in the values of a file column meets certain criteria. It evaluates each value in the column and verifies if the conditions are satisfied. The conditions include the maximum total number of spaces allowed and the maximum number of spaces between any two words (if specified).

Formula

Not Applicable

Supported Object Type

The Word Validation (FC) function can be applied to file columns. It allows you to validate the values in a specific column of a file based on the conditions defined.

Supported Data Types

The function supports a range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, and string.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Word Validation (FC) function requires the following inputs:

Maximum total number of spaces: Specify the maximum number of spaces allowed in the values of the file column. This value is provided as "Parameter 1".
Maximum number of spaces between any two words (Optional): Specify the maximum number of spaces allowed between any two words in the values of the file column. This value is provided as "Parameter 2".

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "NOTES" with the following string values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	NOTES
1001	John Doe	johndoe@example.com	Hello OvalEdge
1002	Jane Doe	janedoe@example.com	HELLO It's Jane - Welcome to OvalEdge
1002	Jane Doe	janedoe@example.com	HELLO It's Jane - Welcome to OvalEdge
1003	Mark Liu	markliu@example.com	Hello OvalEdge

Let's apply the Word Validation(FC) function to the "Text" column and set the parameters as follows:

Parameter 1: Maximum total number of spaces allowed = 10

Parameter 2: Maximum number of spaces between any two words = 1

By evaluating each value in the column, we observe the following:

Hello OvalEdge
- Maximum total number of spaces = 1
- Maximum number of spaces between any two words = 1
- Result = Passed
HELLO It’s Jane - Welcome to OvalEdge
- Maximum total number of spaces = 6
- Maximum number of spaces between any two words = 1
- Result = Passed

In this example,

The total number of spaces in the column is less than the maximum of 10 specified in Parameter 1. Hence, the Word Validation(FC) rule passes for Parameter 1.
The maximum number of spaces between any two words in the column is equal to the 1 specified in Parameter 2. Hence, the Word Validation(FC) rule passes for Parameter 2.

The output for this example would be as follows:

Result: Successful
Passed Count: 4
Failed Count: 0
Total Count: 4
Failed Values: [ ]

Number Validation (FC)

Name

Number Validation (FC)

Description

The Number Validation (FC) function is used to validate the format of values in a file column, specifically for integer or floating-point formats (int/float). The function checks if the values in the column match the specified format based on the provided parameters. The parameters include the number format (int/float), the floating-point decimal limit, the integer number of digits limit, and whether the values in the column contain commas or not.

Formula

Not Applicable

Supported Object Type

The Number Validation (FC) function can be applied to file columns. It allows you to validate the format of values in a specific column of a file based on the specified number format and other criteria.

Supported Data Types

The function supports a range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, and integer.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Number Validation (FC) function requires the following inputs:

Parameter 1: Number format (int/float): Specify the desired number format (int or float) for the values in the file column.
Parameter 2: Floating point decimals limit: Specify the maximum number of decimal places allowed for floating-point values.
Parameter 3: Integer number of digits limit: Specify the maximum number of digits allowed for integer values.
Parameter 4: Commas in values (Y/N): Specify whether the values in the column contain commas or not. If the values contain commas, set this parameter to "Y"; otherwise, set it to "N".

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	20
1002	Jane Doe	janedoe@example.com	40
1003	David Warner	davidw@example.com	30
1004	Mark Liu	markliu@example.com	25
1005	Mike Tyson	mtyson@example.com	35

Let's apply the Number Validation (FC) function to the "SCORE" column with the following parameters:

Parameter 1: Number format (int/float) = int

Parameter 3: Integer number of digits limit = 2

By evaluating each value in the column, we observe the following:

20 = Passed
40 = Passed
30 = Passed
25 = Passed
35 = Passed

In this example, all values in the "SCORE" column are in the integer format and have a maximum of two digits, which matches the specified criteria. Therefore, the Number Validation (FC) rule passes, indicating that all the values in the file column respect the condition.

Date Validation (FC)

Name

Date Validation (FC)

Description

The Date Validation (FC) function is used to validate the format and range of dates in a file column. The function checks if the date format in the column values matches the specified format and if the year of each date falls within the specified range. The format of the dates and the range of years are provided as parameters.

Formula

Not Applicable

Supported Object Type

The Date Validation (FC) function can be applied to file columns. It allows you to validate the format and range of dates in a specific column of a file.

Supported Data Types

The function supports a range of data types that includes:

date, datetime, and timestamp data types.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Date Validation (FC) function requires the following inputs:

Parameter 1: Date Format: Specify the desired format for the dates in the file column.
Parameter 2: Minimum Year: Specify the minimum year allowed for the dates. Dates with a year earlier than the specified minimum year will fail the validation.
Parameter 3: Maximum Year: Specify the maximum year allowed for the dates. Dates with a year later than the specified maximum year will fail the validation.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "EMP_JOINING_DATE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	EMP_JOINING_DATE
1001	John Doe	johndoe@example.com	01-01-2011
1002	Jane Doe	janedoe@example.com	03-15-2012
1003	David Warner	davidw@example.com	07-21-2013
1004	Mark Liu	markliu@example.com	12-31-2012
1005	Mike Tyson	mtyson@example.com	11-11-2011

Let's apply the Date Validation (FC) function to the "EMP_JOINING_DATE" column with the following parameters:

Parameter 1: Date Format = YYYY
Parameter 2: Minimum Year = 2011
Parameter 3: Maximum Year = 2014

By validating each date in the column, we observe the following:

“01-01-2011” = Passed
“03-15-2012” = Passed
“07-21-2013” = Passed
“12-31-2012” = Passed
“11-11-2011” = Passed

In this example, all the dates in the "EMP_JOINING_DATE" column match the specified format of YYYY and fall within the range of 2011 to 2014. Therefore, the Date Validation (FC) rule is successful, indicating that all the values in the file column meet the format and range criteria.

The output for this example would be as follows:

Result: Successful
Passed Count: 4
Failed Count: 0
Total Count: 4

Numeric Value Between Min And Max Values (FC)

Name

Numeric Value Between Min And Max Values (FC)

Description

The Numeric Value Between Min And Max Values (FC) function is used to validate whether the values in a file column fall within a specified range. The function checks if each value in the column is greater than or equal to the specified minimum value and less than or equal to the specified maximum value.

Formula

Not Applicable

Supported Object Type

The Numeric Value Between Min And Max Values (FC) function can be applied to file columns. It allows you to validate whether the values in a specific column of a file fall within a specified range.

Supported Data Types

The function supports a range of data types that includes:

integer (int, bigint, smallint, integer) and floating-point (number, long, decimal, double, float) data types.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Numeric Value Between Min And Max Values(FC) function requires the following inputs:

Parameter 1: Minimum Value: Specify the minimum value allowed for the column values. Values below this minimum will fail the validation.
Parameter 2: Maximum Value: Specify the maximum value allowed for the column values. Values above this maximum will fail the validation.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	11
1002	Jane Doe	janedoe@example.com	12
1003	David Warner	davidw@example.com	13
1004	Mark Liu	markliu@example.com	17
1005	Mike Tyson	mtyson@example.com	19

Let's apply the Numeric Value Between Min And Max Values (FC) function to the "Values" column with the following parameters:

Parameter 1: Minimum Value = 1
Parameter 2: Maximum Value = 9

By validating each value in the column, we observe the following:

“11” = Passed
“12” = Passed
“13” = Passed
“17” = Passed
“19” = Passed

In this example, all the values in the "Values" column fall within the specified range of 1 to 9. Therefore, the Numeric Value Between Min And Max Values (FC) rule is successful, indicating that all the values in the file column meet the range criteria.

The output for this example would be as follows:

Result: Successful
Passed Count: 5
Failed Count: 0
Total Count: 5

Value Validation (FC)

Name

Value Validation (FC)

Description

The Value Validation (FC) function is used to validate whether all values in a file column are the same. The function checks if every value in the column matches a specific value.

Formula

Not Applicable

Supported Object Type

The Value Validation (FC) function can be applied to file columns. It allows you to validate whether all values in a specific column of a file are the same.

Supported Data Types

The function supports a range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string, date, datetime, and timestamp.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Value Validation (FC) function does not require any specific inputs. It compares each value in the file column to check if they are identical.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "STATUS" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	STATUS
1001	John Doe	johndoe@example.com	Permanent
1002	Jane Doe	janedoe@example.com	Permanent
1003	David Warner	davidw@example.com	Permanent
1004	Mark Liu	markliu@example.com	Permanent
1005	Mike Tyson	mtyson@example.com	Permanent

Let's apply the Value Validation (FC) function to the "STATUS" column. Since all the values in the column are identical and match the value "Permanent", the function would result in a success.

The output for this example would be:

Result: Successful
Passed Count: 5
Failed Count: 0
Total Count: 5

Allow Only Specified Special Character (FC)

Name

Allow Only Specified Special Character (FC)

Description

The Allow Only Specified Special Character (FC) function is used to validate if the values in a file column contain only one type of special character specified in the Input Parameter. The function checks if the column values consist of only the specified special character.

Formula

Not Applicable

Supported Object Type

The Allow Only Specified Special Character (FC) function can be applied to file columns. It allows you to validate if the values in a specific column of a file contain only one type of special character.

Supported Data Types

The function supports a range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string, date, datetime, and timestamp.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Allow Only Specified Special Character (FC) function requires the following input:

Parameter 1: Special Character - The special character that should be allowed in the column values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "EMAIL" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	STATUS
1001	John Doe	johndoe@example.com	Permanent
1002	Jane Doe	janedoe@example.com	Permanent
1003	David Warner	davidw@example.com	Permanent
1004	Mark Liu	markliu@example.com	Permanent
1005	Mike Tyson	mtyson@example.com	Permanent

Let's apply the Allow Only Specified Special Character (FC) function to the "EMAIL" column with the special character "@" specified in "Parameter 1".

Since all the values in the column contain only the "@" special character, the function would result in a success.

The output for this example would be:

Result: Successful
Passed Count: 5
Failed Count: 0
Total Count: 5

Has Length Of The Value Within Range (FC)

Name

Has Length Of The Value Within Range (FC)

Description

The Has Length Of The Value Within Range (FC) function is used to validate if the length of characters in the column values falls within the specified range. It checks if the number of characters in each value of the file column is within the specified minimum and maximum range.

Formula

Not Applicable

Supported Object Type

The Has Length Of The Value Within Range (FC) function can be applied to file columns. It allows you to validate if the length of characters in a specific column of a file falls within a specified range.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Has Length Of The Value Within Range (FC) function requires the following input:

Parameter 1: Minimum number of characters allowed.
Parameter 2: Maximum number of characters allowed.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "CITY" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	CITY
1001	John Doe	johndoe@example.com	Atlanta
1002	Jane Doe	janedoe@example.com	New York
1003	David Warner	davidw@example.com	Orlando
1004	Mark Liu	markliu@example.com	Dallas
1005	Mike Tyson	mtyson@example.com	San Francisco

Let's apply the Has Length Of The Value Within Range (FC) function to the "CITY" column with a minimum length of 4 characters and a maximum length of 10 characters specified in "Parameter 1" and "Parameter 2" respectively.

Since all the city names have a length between 4 and 15 characters, the function would result in success.

The output for this example would be:

Result: Successful
Passed Count: 5
Failed Count: 0
Total Count: 5

Columns Unique Percent Value (FC)

Name

Columns Unique Percent Value (FC)

Description

The Columns Unique Percent Value (FC) function is used to check if all the values in a file column are unique, meaning there are no repeated values. It determines the uniqueness percentage of the column values.

Formula

Not Applicable

Supported Object Type

The Columns Unique Percent Value (FC) function can be applied to file columns. It allows you to validate if the values in a specific column of a file are unique.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Columns Unique Percent Value (FC) function does not require any specific input.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "DEPARTMENT" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	DEPARTMENT
1001	John Doe	johndoe@example.com	IT
1002	Jane Doe	janedoe@example.com	IT
1003	David Warner	davidw@example.com	IT
1004	Mark Liu	markliu@example.com	HR
1005	Mike Tyson	mtyson@example.com	IT

Let's apply the Columns Unique Percent Value (FC) function to the "DEPARTMENT" column. Since the value "IT" is repeated four times, the function would result in failure.

The output for this example would be:

Result: Failure
Passed Count: 1
Failed Count: 4
Total Count: 5
Failed Values: IT (Non Unique Value)

Outliers To Aggregations (FC)

Name

Outliers To Aggregations (FC)

Description

The Outliers To Aggregations (FC) function is used to evaluate whether the deviation of a column value from the average of all values in the column is less than a specified percentage.

Its purpose is to identify outliers within a column based on their deviation from the average.

Formula

The formula for computing the deviation percentage is as follows:

DEV % = (ABS(COLUMN VALUE - AVERAGE) / AVERAGE) * 100

Supported Object Type

The Outliers To Aggregations (FC) function can be applied to file columns. It allows you to identify and evaluate outliers within a file column based on their deviation from the average.

Supported Data Types

The function supports a range of data types that includes:

int, bigint, number, long, decimal, double, float, smallint, integer.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Outliers To Aggregations (FC) function requires the following input:

Parameter 1: Average - This input represents the average of all values in the column.
Parameter 2: Percent - This input specifies the maximum allowable deviation percentage.
Parameter 3: File Column - This input identifies the column on which the rule is applied.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	5
1002	Jane Doe	janedoe@example.com	10
1003	David Warner	davidw@example.com	15
1004	Mark Liu	markliu@example.com	20
1005	Mike Tyson	mtyson@example.com	25

Let's apply the "Outliers To Aggregations(FC)" function. We have a file column with the following values: [5, 10, 15, 20, 25]. The average of these values is 15.

Suppose we set the parameter values as follows:

Parameter 1 (Average): 15
Parameter 2 (Percent): 50
Parameter 3 (File Column): [5, 10, 15, 20, 25]

We can calculate the deviation percentage for each value in the column:

Deviation for 5: (ABS(5 - 15) / 15) * 100 = 66.67% - Failed
Deviation for 10: (ABS(10 - 15) / 15) * 100 = 33.33% - Passed
Deviation for 15: (ABS(15 - 15) / 15) * 100 = 0% - Passed
Deviation for 20: (ABS(20 - 15) / 15) * 100 = 33.33% - Passed
Deviation for 25: (ABS(25 - 15) / 15) * 100 = 66.67% - Failed

In this case, the two deviation percentages are not within the specified percentage range of 50%. Therefore, the rule is considered Failure.

The output for this example would be:

Result: Failure
Passed Count: 3
Failed Count: 2
Total Count: 5
Failed Values: [5, 25]

Has No Null Value (FC)

Name

Has No Null Value (FC)

Description

The Has No Null Value (FC) function is used to determine whether a file column contains any null values. Its purpose is to validate if there are no null values present in the column.

Formula

Not Applicable

Supported Object Type

The Has No Null Value (FC) function can be applied to file columns. It allows you to check for the presence of null values within a file column.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Has No Null Value (FC) function requires no specific input parameters. It examines the file column to determine if there are any null values present.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	SCORE
1001	John Doe	johndoe@example.com	5
1002	Jane Doe	janedoe@example.com	10
1003	David Warner	davidw@example.com
1004	Mark Liu	markliu@example.com	20
1005	Mike Tyson	mtyson@example.com	25

In this case, there is a null value present after the value 10 in the file column. Therefore, the "Has No Null Value (FC)" rule has failed.

The output for this example would be:

Result: Failure
Passed Count: 4
Failed Count: 1
Total Count: 5
Failed Values: [null]

Has No Digits or Special Chars (FC)

Name

Has No Digits or Special Chars (FC)

Description

The Has No Digits or Special Chars (FC) function is used to determine whether a file column contains any special characters or digits. Its purpose is to validate if the column contains only alphabetic characters and does not have any special characters or digits.

Formula

Not Applicable

Supported Object Type

The Has No Digits or Special Chars (FC) function can be applied to file columns. It allows you to check for the presence of special characters or digits within a file column.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Has No Digits or Special Chars (FC) function requires no specific input parameters. It examines the file column to determine if there are any special characters or digits present.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "ADDRESS" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	ADDRESS
1001	John Doe	johndoe@example.com	Suite:101, Emerald Towers, Jacksonville, Florida
1002	Jane Doe	janedoe@example.com	Street 21, Park Avenue, New York.
1003	David Warner	davidw@example.com	Suite 120 &122, Twin Towers, Dallas.

In this case, the value in the file column contains numbers and special characters. Therefore, the "Has No Digits or Special Chars (FC)" rule has failed.

The output for this example would be:

Result: Failure
Passed Count: 0
Failed Count: 3
Total Count: 3
Failed Values: [“Suite:101, Emerald Towers, Jacksonville, Florida”, “Street 21, Park Avenue, New York.”, “Suite 120 &122, Twin Towers, Dallas.”]

Has Digits and Special Chars (FC)

Name

Has Digits and Special Chars (FC)

Description

The Has Digits and Special Chars (FC) function is used to determine whether all the values in a file column contain both digits and special characters. Its purpose is to validate if the column values consist of a combination of digits and special characters.

Formula

Not Applicable

Supported Object Type

The Has Digits and Special Chars (FC) function can be applied to file columns. It allows you to check if all values in a file column contain both digits and special characters.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Has Digits and Special Chars (FC) function requires no specific input parameters. It examines the file column values to determine if they consist of a combination of digits and special characters.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "ADDRESS" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	ADDRESS
1001	John Doe	johndoe@example.com	Suite:101, Emerald Towers, Jacksonville, Florida
1002	Jane Doe	janedoe@example.com	Street 21, Park Avenue, New York.
1003	David Warner	davidw@example.com	Suite 120 &122, Twin Towers, Dallas.

In this case, the values in the "ADDRESS" column contain both digits and special characters. Therefore, the "Has Digits and Special Chars (FC)" rule is considered successful.

The output for this example would be:

Result: Successful
Passed Count: 3
Failed Count: 0
Total Count: 3
Failed Values: []

No Leading or Trailing Spaces (FC)

Name

No Leading or Trailing Spaces (FC)

Description

The No Leading or Trailing Spaces (FC) function is used to determine whether there are any leading or trailing spaces in a file column. Its purpose is to check if the column values have any spaces at the beginning or end.

Formula

Not Applicable

Supported Object Type

The No Leading or Trailing Spaces (FC) function can be applied to file columns. It allows you to validate if the column values have any leading or trailing spaces.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The No Leading or Trailing Spaces (FC) function requires no specific input parameters. It examines the file column values to determine if they contain any leading or trailing spaces.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "ADDRESS" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	ADDRESS
1001	John Doe	johndoe@example.com	Suite:101, Emerald Towers, Jacksonville, Florida
1002	Jane Doe	janedoe@example.com	Street 21, Park Avenue, New York.
1003	David Warner	davidw@example.com	Suite 120 &122, Twin Towers, Dallas.

In this case, the values in the "ADDRESS" column contain a leading space at the beginning. Therefore, the "No Leading or Trailing Spaces (FC)" rule is considered a failure.

The output for this example would be:

Result: Failure
Passed Count: 1
Failed Count: 2
Total Count: 3
Failed Values: ["Suite:101, Emerald Towers, Jacksonville, Florida", “Suite 120 &122, Twin Towers, Dallas.”]

Has Substring in String (FC)

Name

Has Substring in String (FC)

Description

The Has Substring in String (FC) function is used to determine whether a specified substring is present within the values of a file column. Its purpose is to check if the substring exists within the column values.

Formula

Not Applicable

Supported Object Type

The Has Substring in String (FC) function can be applied to file columns. It allows you to validate if a specific substring is present within the column values.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Has Substring in String (FC) function requires the following input:

Parameter 1: Substring - Specify the substring to be searched within the file column values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "EMAIL" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	ADDRESS
1001	John Doe	johndoe@example.com	Suite:101, Emerald Towers, Jacksonville, Florida
1002	Jane Doe	janedoe@example.org	Street 21, Park Avenue, New York.
1003	David Warner	davidw@example.in	Suite 120 &122, Twin Towers, Dallas.

Suppose we want to check if the substring ".com" is present in the column values.

In this case, the substring ".com" is found in the value "johndoe@example.com". Therefore, the "Has Substring in String (FC)" rule is considered successful.

The output for this example would be:

Result: Successful
Passed Count: 1
Failed Count: 0
Total Count: 3

Belongs To One Of Languages (FC)

Name

Belongs To One Of Languages (FC)

Description

The Belongs To One Of Languages (FC) function is used to determine whether the values of a file column belong to a specified list of languages. Its purpose is to check if the column values are within the provided languages.

Formula

Not Applicable

Supported Object Type

The Belongs To One Of Languages (FC) function can be applied to file columns. It allows you to validate if the values of a column are within the provided list of languages.

Supported Data Types

The function supports a range of data types that includes:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Belongs To One Of Languages (FC) function requires the following input:

Parameter 1: Languages - Specify the list of languages separated by commas that the column values should belong to.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "LANGUAGE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	LANGUAGE
1001	John Doe	johndoe@example.com	English
1002	Jane Doe	janedoe@example.org	French
1003	David Warner	davidw@example.in	Spanish

Suppose we specify "English" as the list of languages in Parameter 1.

In this case, the column values contain two languages other than "English" (Spanish and French). Therefore, the "Belongs To One Of Languages (FC)" rule is considered a failure.

The output for this example would be:

Result: Failure
Passed Count: 2
Failed Count: 2
Total Count: 4
Failed Values: [Spanish, French]

Word Count Within Range (FC)

Name

Word Count Within Range (FC)

Description

The Word Count Within Range (FC) function is used to check if the number of words in a file column falls within a specified range. Its purpose is to determine if the column values have a word count within the user-defined range.

Formula

Not Applicable

Supported Object Type

The Word Count Within Range (FC) function can be applied to file columns. It allows you to validate if the number of words in a column falls within the specified range.

Supported Data Types

The function supports a range of data types that includes:

character, char, ntext, tinytext, text, mediumtext, longtext, nchar, varchar, nvarchar, varchar2, nvarchar2, string.

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Word Count Within Range (FC) function requires the following input:

Parameter 1: Minimum Count - Specify the minimum number of words allowed in the column values.
Parameter 2: Maximum Count - Specify the maximum number of words allowed in the column values.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file column named "NEAREST_HOSPITAL" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	NEAREST_HOSPITAL
1001	John Doe	johndoe@example.com	Child Health Center
1002	Jane Doe	janedoe@example.org	Diagnostic & Treatment Center
1003	David Warner	davidw@example.in	Nursing Home

Suppose we set the minimum count (Parameter 1 = 3) and the maximum count (Parameter 2 = 4).

In this case, the first two values have a word count within the specified range (3-4 words), but the third value "Nursing Home" has only two words. Therefore, the "Word Count Within Range (FC)" rule is considered a failure.

The output for this example would be:

Result: Failure
Passed Count: 2
Failed Count: 1
Total Count: 3
Failed Value: [Nursing Home]

Column Value Same As Other Column Value (FC)

Name

Column Value Same As Other Column Value (FC)

Description

The Column Value Same As Other Column Value (FC) function is used to check if the values of a selected column match the values in another specified column. Its purpose is to determine if the selected column values are identical to the values in the specified column.

Formula

Not Applicable

Supported Object Type

The Column Value Same As Other Column Value (FC) function can be applied to file columns. It allows you to compare the values of one column with the values in another specified column.

Supported Data Types

The function supports a wide range of data types, including:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Column Value Same As Other Column Value (FC) function requires the following input:

Parameter 1: Other Column - Specify the name of the column whose values will be compared against the selected column.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file columns named "Q1_SCORE" and "Q2_SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	Q1_SCORE	Q2_SCORE
1001	John Doe	johndoe@example.com	10	10
1002	Jane Doe	janedoe@example.org	5	10
1003	David Warner	davidw@example.in	20	10

Suppose we set "Q2_SCORE" as Parameter 1 to compare against the values of "Q1_SCORE".

In this case, all the values in "Q1_SCORE" match the corresponding values in "Q2_SCORE" except for the last two values. Therefore, the "Column Value Same As Other Column Value (FC)" rule is considered a failure.

The output for this example would be:

Result: Failure
Passed Count: 1
Failed Count: 2
Total Count: 3
Failed Values: [5,20]

Regex Match (FC)

Name

Regex Match (FC)

Description

The Regex Match (FC) function is used to verify if all values in a column match a specified regular expression pattern. It checks whether the values in the column adhere to the provided regex format and determines the success or failure of the rule.

Formula

Not Applicable

Supported Object Type

The Regex Match (FC) function can be applied to file columns. It allows you to validate the values in a column against a specified regex pattern.

Supported Data Types

The function supports a variety of data types, including:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Regex Match (FC) function requires the following input:

Parameter 1: Regex Pattern - Specify the regular expression pattern that the column values should match.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Let's consider a file column "NOTES" containing a set of values on which the Regex Match rule is applied.

EMPLOYEEID	NAME	EMAIL	NOTES
1	John Doe	johndoe@example.com	Hello
2	Jane Doe	janedoe@example.com	HELLO
3	Alex Lee	alexlee@example.com	Hello123
4	Mark Liu	markliu@example.com	889976545
5	Lisa Kim	lisa.kim@example.com	Hello OvalEdge!
6	Eric Wu	ericwu@example.com	Hello 234!
7	Sarah Tan	sarah.tan@example.com	HELLO OvalEdge
8	Tom Wong	tom.wong@example.com
9	Emma Chen	emma.chen@example.com	HELLO-234&
10	Jack Lin	jacklin@example.com	HELLO

The regular expression pattern ^[A-Za-z0-9]*$ can be broken down as follows:

^ - This symbol represents the start of the line. It ensures that the pattern matches from the beginning of the input string.
[A-Za-z0-9] - This character class represents a range of characters. In this case, it includes uppercase letters (A-Z), lowercase letters (a-z), and digits (0-9). This means that any of these characters can appear in the input string.
This quantifier specifies that the preceding character class ([A-Za-z0-9]) can occur zero or more times. It allows for an empty string or any combination of uppercase letters, lowercase letters, and digits.
$ - This symbol represents the end of the line. It ensures that the pattern matches all the way to the end of the input string.

Example: "Hello" or “HELLO” or “Hello123”

This input string consists of only uppercase letters, lowercase letters, and digits.

It matches the pattern because it contains zero or more characters that fall within the range specified by the character class [A-Za-z0-9].

Example: "889976545"

This input string consists of only digits.

It matches the pattern because it contains zero or more characters that fall within the range specified by the character class [A-Za-z0-9].

Example: "Hello OvalEdge!" or “Hello 234!” or “HELLO-234&”

This input string contains a space, dash, and an exclamation mark, which are not included in the character class [A-Za-z0-9].

It does not match the pattern because it fails to meet the requirement of having only uppercase letters, lowercase letters, and digits.

Example: ""

This input string is an empty string.

It matches the pattern because it contains zero characters, which satisfies the condition of zero or more characters specified by the * quantifier.

The output for this example would be:

Result: Successful
Passed Count: 10
Failed Count: 0
Total Count: 10
Failed Values: []

Column Value Matches One Of Given Values (FC)

Name

Column Value Matches One Of Given Values (FC)

Description

The Column Value Matches One Of Given Values (FC) function is used to determine if any value in the selected column matches with the value(s) provided in Parameter 1. It evaluates whether any of the values in the column match the specified value(s) and determines the success or failure of the rule.

Formula

Not Applicable

Supported Object Type

The Column Value Matches One Of Given Values(FC) function can be applied to file columns. It allows you to compare the values in a column against the provided value(s).

Supported Data Types

The function supports a variety of data types, including:

Supported Connectors

Files: S3, NFS, and Azure Data Lake.

Input

The Column Value Matches One Of Given Values(FC) function requires the following inputs:

Parameter 1: Values - Specify the value(s) that will be compared with the values in the selected column.
Parameter 2: Separator - Specify the separator used to separate the values in Parameter 1.

Output

The output of the function can be one of the following:

Success or Failure or Undetermined or Not Executed or Execution Failed.

On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.

Example

Consider a file columns named "Q1_SCORE" and "Q2_SCORE" with the following numeric values:

File Name: "EMPLOYEE.csv"

EMPLOYEEID	NAME	EMAIL	Q1_SCORE	Q2_SCORE
1001	John Doe	johndoe@example.com	10	10
1002	Jane Doe	janedoe@example.org	5	10
1003	David Warner	davidw@example.in	20	10

Suppose we set the input parameters as follows:

Parameter 1: "Doe|Warner"
Parameter 2: "|"

In this case, the value "Doe" from Parameter 1 matches with the value "Doe" in the column, and the value "Warner" from Parameter 1 matches with the value "Warner" in the column. Therefore, the "Column Value Matches One Of Given Values (FC)" rule is considered successful.

The output for this example would be:

Result: Successful
Passed Count: 3
Failed Count: 0
Total Count: 3
Failed Values: []

Data Quality Functions for Query

Sql Exact Value (SQL)

Name	Sql Exact Value (SQL)
Description	The Sql Exact Value(SQL) function is used to determine the success or failure based on the return value of an associated SQL code object in the source system. If the SQL code object returns a value of 1 after execution, the rule is considered successful. Otherwise, it is considered a failure.
Formula	Not Applicable
Supported Object Type	The Sql Exact Value(SQL) function is applicable to queries. It allows you to execute SQL code and evaluate the return value.
Supported Connectors	RDBMS: MySQL, SQL Server, PostgreSQL, Snowflake and more.
Input	Not Applicable
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed. On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.
Example
Consider an SQL code object that checks if a specific condition is met in a table. The SQL code is as follows: SQL Code: SELECT COUNT() FROM employees WHERE salary > 50000;* The objective is to determine if the number of employees with a salary greater than 50,000 is 1 or 0. Based on this, the Sql Exact Value(SQL) rule will evaluate the success or failure of the rule. Let's assume that the SQL code object returns a value of 1. In this case, the Sql Exact Value(SQL) rule will be considered successful. On the other hand, if the SQL code object returns a value of 0, the rule will be considered a failure. The output of the Sql Exact Value(SQL) rule will provide the following information: Passed Count: The number of times the rule was executed and resulted in success. Failed Count: The number of times the rule was executed and resulted in failure. Result: The overall result of the rule execution, indicating if it was successful or failed. For example, let's say the rule was executed 10 times. Out of these, the SQL code object returned a value of 1 in 8 instances and a value of 0 in 2 instances. Output: Passed Count: 8 Failed Count: 2 Result: Failure In this case, the rule would be considered a failure because the SQL code object returned a value of 0 in 2 instances, indicating that the condition (salary > 50000) was not met for those employees.

Custom Sql Exact Value (SP)

Name	Custom Sql Exact Value (SP)
Description	The Custom Sql Exact Value (SP) function determines the success or failure based on the return value of an associated code object (stored procedure) in the source system. If the code object returns a value of 1 after execution, the rule is considered successful. Otherwise, it is considered a failure.
Formula	Not Applicable
Supported Object Type	The Custom Sql Exact Value (SP) function is applicable to queries. It allows you to execute code objects (stored procedure) and evaluate the return value.
Supported Connectors	RDBMS: MySQL, SQL Server, PostgreSQL, Snowflake and more.
Input	The Custom Sql Exact Value (SP) function requires the following inputs: Success Values: The inputs and outputs of the stored procedure in a specific format. Each value should be separated by a semicolon (;), and the format should be as follows: Input,ColumnName,Value,DataType; (for input parameters) Output,ColumnName,DataType; (for output parameters) Example: Input,Salary,140000.4500,DECIMAL;Input,Age,14,INT;OUTPUT,ReturnSalary,INT;
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed. On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.
Example
Consider a stored procedure that performs certain calculations and returns a value of 1 or 0 based on specific conditions. The objective is to evaluate the success or failure of the rule based on the return value of the stored procedure. Let's assume the stored procedure is designed to return a value of 1 if certain conditions are met and 0 otherwise. In this case, if the stored procedure returns a value of 1, the Custom Sql Exact Value (SP) rule will be considered successful. Conversely, if the stored procedure returns a value of 0, the rule will be considered a failure. The output of the Custom Sql Exact Value (SP) rule will provide the following information: Result: Indicates whether the rule was successful or failed. On Failure: Displays the following details: Passed Count: The number of times the rule was executed and resulted in success. Failed Count: The number of times the rule was executed and resulted in failure. Total Count: The total number of times the rule was executed. Failed Values: The specific values or conditions that resulted in failure. For example, let's say the rule was executed 10 times. Out of these, the stored procedure returned a value of 1 in 8 instances and a value of 0 in 2 instances. Output: Passed Count: 8 Failed Count: 2 Result: Failure In this case, the rule would be considered a failure because the stored procedure returned a value of 0 in 2 instances, indicating that the specified conditions were not met.

Sql Value Contains (SQL)

Name	Sql Value Contains (SQL)
Description	The Sql Value Contains (SQL) rule determines the success or failure based on the return value of an associated code object (query) in the source system. The rule is considered successful if the code object returns a value that exists in the specified list of values after execution. Otherwise, it is considered a failure.
Formula	Not Applicable
Supported Object Type	The Sql Value Contains (SQL) function is applicable to queries. It allows you to execute SQL code and evaluate the return value.
Supported Connectors	RDBMS: MySQL, SQL Server, PostgreSQL, Snowflake and more.
Input	The Sql Value Contains (SQL function requires the following inputs: Parameter 1: Specify the discrete values separated by commas
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed. On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.
Example
Consider a scenario where a code object (query) is executed and returns a specific value. Let's assume the code object returns a value of "B". We have a list of values [A, B, C, D, E]. In this case, the Sql Value Contains (SQL) rule will be considered successful because the returned value "B" exists in the list. Output: Result: Successful

Sql Value Range (SQL)

Name	Sql Value Range (SQL)
Description	The Sql Value Range (SQL) rule determines the success or failure based on the return value of an associated SQL code object in the source system. The rule is considered successful if the code object returns a value that falls within the specified range of values after execution. Otherwise, it is considered a failure.
Formula	Not Applicable
Supported Object Type	The Sql Value Range (SQL) function is applicable to queries. It allows you to execute SQL code and evaluate the return value.
Supported Connectors	RDBMS: MySQL, SQL Server, PostgreSQL, Snowflake and more.
Input	The Sql Value Range (SQL function requires the following inputs: Parameter 1: Specify the minimum value Parameter 1: Specify the maximum value
Output	The output of the function can be one of the following: Success or Failure or Undetermined or Not Executed or Execution Failed. On Failure, the rule shows the Passed Count, Failed Count, Total Count, and Failed Values.
Example
Let's consider the following SQL query as the associated code object: SQL Code: SELECT COUNT() FROM employees WHERE salary > 50;* In this example, the SQL code object counts the number of employees whose salary is greater than 50. To apply the Sql Value Range (SQL) rule, we define the input values as follows: Parameter 1: 0 (minimum value) Parameter 2: 100 (maximum value) Based on the specified range, the Sql Value Range (SQL) rule will evaluate the success or failure of the rule. Suppose the query returns a value of 75. In this case, the Sql Value Range (SQL) rule will be considered successful because the returned value falls within the specified range of 0 to 100. Output: Passed Count: 1 Failed Count: 0 Total Count: 1 Result: Successful In this example, the rule passes as the returned value of 75 falls within the specified range.