Various functions in Data Quality Rules

We have various Data Quality Rule types for the following:

  • Tables
  • Table Columns
  • File Columns

Data Quality Rule types for Tables

 

Rule Type

Parameter

DENSITY PERCENT:  Returns TRUE if the computed density percentage falls in the range else FALSE

Param-1: Lower Range

Param-2: Upper Range

NULL DENSITY PERCENT:  Returns TRUE if the computed null density percentage falls in the range else FALSE

Param-1: Lower Range

Param-2: Upper Range

TOTAL ROW COUNT:  Returns TRUE if the total row count in table falls in the range else FALSE

Param-1: Lower Range

Param-2: Upper Range


Data Quality Rule types for Table Columns

 

Rule Type

Parameter

AVERAGE: Returns TRUE if the average computed falls in the range else FALSE

Param-1: Lower Range

Param-2: Upper Range

DENSITY PERCENT: Returns TRUE if the density computed falls in the range else FALSE

Param-1: Lower Range

Param-2: Upper Range

DISTINCT: Returns TRUE if the count of distinct values falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

MAX:  Returns TRUE if the min value falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

MIN: Returns TRUE if the max value falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

NULL COUNT: Returns TRUE if the count of null values falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

NULL DENSITY PERCENT: Returns TRUE if the computed null density falls in the percentage range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

STD DEVIATION:  Returns TRUE if the computed standard deviation falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

SUM:  Returns TRUE if the sum of all cells computed falls in the range; else FALSE

Param-1: Lower Range

Param-2: Upper Range

TOP VALUES:  Returns TRUE if the top values of column matches with at least one value in param 1; else FALSE

Param-1: Comma separated values that user is expecting to be in top values

VALIDATE EMAIL: Returns TRUE if format of the elements in the column is “%_@_%._%”, else FALSE

No Parameter

VALIDATE PATTERN: Returns TRUE if the defined pattern (U,L,D or combination of any) matches with the column pattern; else FALSE

Param-1: Pattern Format

VALIDATE REGEX: Returns TRUE if the defined REGEX matches with the column values; else FALSE

Param-1: REGEX


Data Quality Rule types for File columns

 

Rule Type

Parameter

HAS_NOT_NULL:(Has no null value) returns true if there is no null value; else it returns false.

No Parameter

HAS_NO_DIGITS_OR_SPECIAL_CHARS:(Has No digits or special chars) returns true if there are no special characters or any digits else returns false.

No Parameter.

HAS_FIRST_AND_LAST_CHAR:(No leading or trailing spaces) returns true if there are no leading or trailing spaces, else it returns false.

No Parameter

HAS_SUBSTRING_IN_STRING:(Has substring in string) returns true if the substring value is present in the given string; else it returns false.

Param1: it contains a substring value.



HAS_LANGUAGE:(Belongs to one of the languages) returns true if the language of the column matches the provided language passed in parameter1; else returns false.

Param1: it contains the language to be specified in the column data.

HAS_WORD_RANGE:(Word count within range)rule returns true if the number of words present in the column data is in the range given by the user; otherwise returns false.

Param-1: Minimum number of words

Param-2: Maximum number of words

HAS_EXACT_VALUE:(Column Value same as another Column Value)

returns true if the value of the column given in parameter1 matches with the value given in the column value on which we are running the rule; else it returns false.

Param1: “column name” to be matched.

HAS_REGEX_MATCH:(Regex match) returns true if the data in the column matches with the regex given in the parameter1 else it returns false.

Param1: Regex to be matched.

HAS_COLUMN_VALUE_MATCH:(Column Value matches one of given values) returns true if the value of the given column matches with any one of the values of the parameter1 else it returns false.

Param1: It contains the values to be matched with a given column value.


Param2: Separator.

HAS_COLUMN_MATCH_FIXED_VALUE :(All Column Values matches given fixed value) returns true if the value in the column matches with the fixed value given in the parameter1 else returns false.

Param1: Value to be matched.

HAS_FIRST_UPPER_REST_LOWER:(First Letter of Word Upper case and Rest Lower Case) returns true if the first letter of the word is in uppercase and the rest of the word is in lowercase; else it returns false.

No Parameter

HAS_UNIQUE_VALUE:(This Column Value Should Be Unique) returns true if the value in the column is unique i.e. (No duplicate value) else it returns false

No Parameter

WORD_VALIDATION:(Word Validation)

returns true if the total number of spaces in the column data are less than or equal to the value given in the param1 else it returns false. If we want to limit the number of spaces between words, we need to pass the limit in param2. If the no spaces count between words exceeds the limit it returns false else it returns true.

Param-1: Total no of spaces allowed. 




Param-2: Total no of spaces allowed between words.

NUMBER_VALIDATION:(Number Validation) returns true if the data type in the column(int/float) value matches the parameter1 value which we are passing, otherwise it returns false. If column data type is float and we want to limit the decimal places then we pass decimal limits in parameter2.if decimals exceed the param2 limit then it returns false else true. if column data type is int and we want to limit the number of digits then we pass decimal limits in parameter3. If no of digits exceed the limit in param3 returns false, else true. If we do not want any “commas” in the column data we should pass: “N” else we should pass “T” in param4. If we pass “Y” and we have commas in column data returns true, else returns false. If we pass “N” in param4 and we have no commas in the column data, it returns true, else returns false.

param1: int/float


param2: floating point decimals limit


param3: integer number of digits limit


param4:” Y”/” N” (if user wants commas, then “Y” else “N”)

DATE_VALIDATION: (Date Validation)

returns true if the date format in values matches with the date format given in parameter1; else returns false. If we want to limit the years between the range, we need to pass the minimum year in param2 and maximum year in param3. If the year in column data is in between the minimum and maximum year specified by the user then it returns true, else returns false. (Exclusive values).

param1- which contains the format. 


param-2 which contains the minimum year.


param-3 which contains the maximum year.

HAS_DIGITS_AND_SPECIAL_CHARS:(Has digits and special chars) returns true if the value in the column is alphanumeric and it should contain digits and special characters; else it returns false. There is no need for passing parameters.



No Parameter

HAS_CONSTANT_VALUE: (Value Validation) returns true if the column has the same value in all the rows i.e. (constant value) then it returns false. There is no need for passing parameters.


No Parameter

HAS_ONLY_ONE_SPECIAL_CHAR:(Allow only one special Character) returns true if the column values have only one type of special character which passes in parameter1 else returns false. We must pass the special character in param1 which we want to be in column data.



Param1: special character allowed.

HAS_LENGTH_RANGE:(Has length of the value within range) returns true if the character in the value is within the range; else returns false. We must pass the minimum value range in param1 and maximum value range in param2. 

Param1: minimum no of characters allowed.

Param2: maximum no of characters allowed.



HAS_NUMERIC_VALUE_RANGE :(Numeric Value between min and max) returns true if the number value is within the range, else returns false. We must pass the minimum value range in param1 and maximum value range in param2. 

Param1: minimum number allowed.

Param2: maximum number allowed.