Governance Catalog

Data Quality Rules

Overview

Data Quality is a process of regularly monitoring, controlling, verifying, and correcting the quality of data that comes into the systems. Typically, systems in an enterprise landscape will have accumulated large volumes of data, and getting insights into this quality involves significant effort and technical skills that are always at a premium. 

Data Quality Rule is a set of processes written on a data object to monitor regularly, control, and verify data quality whenever data transformation happens.

Rules are applicable to the following data sources:

  1. DBMS Tables & Columns like RDBMS, ODBMS, NoSQL, etc.
  2. Structured data files like .csv, .xls, .xml, .json, pqt etc. 

DQR Lifecycle

A data quality rule's lifecycle begins with the design and development process, followed by publication. The rules are scheduled here, the results are generated, alerts are notified, and finally, it completes the cycle by creating a service desk ticket to improve the quality of the rules.

  dqrcycle

The main components of the DQR Rule

  • Function: It is the rule function that is executed on the data of the target data object.
  • Target Data Object: The Table, Table Column, Files, and File column on which the rule is applied.
  • The Rule type: is inferred from the function chosen for the rule.
  • Custom Rules: Custom rules require user inputs for data validation based on interpretation values 
    • Boolean. It is a default value.
    • Value. The rule query result should match the value specified for success.
    • Range. The rule query result should be within the range specified.
Roles and Permissions to Access the Data Quality Rule
Roles Permission Description
Author & Analytical User

Create

Update

Delete

Execute

Can create, update, delete and execute a Data Quality Rule
Business User Read-only Cannot create a Data Quality Rule

Rules Attributes: 

Data Quality rules are set at the following three variable levels:

  1. Object Level: Object-level variables work at the Database tables or columns. 
  2. Field Level: Field-level variables work at the Database Schema level, tables, or columns. 
  3. Custom SQL: Custom SQL works for cataloged SQL from the query sheet.

Data Quality Rule main page

The Data Quality Rule main page displays the list of defined Data Quality Rules created on data objects. 

dqrhomepg-1

DQR 

Description

Rule Name

Displays the name of the DQR rule

Purpose 

Displays a brief description of the necessity of the rule

Tags

Displays the DQR associated with tags. The Tags assigned to the database object having the DQR rule applied. Click on the edit icon to edit, assign, or remove tags to the object.

Status

To alter the rule settings, you need to change the rule status to DRAFT.

To complete the rule settings, you need to set the rule status to ACTIVE.

Rule 

The factor/variable the rule can work

Dimension

Following are the various dimensions considered at the time of rule creation.

Dimensions

Description

Accuracy

The data must be accurate and truthful, representing the real-life entity

Completeness

The data should be assigned a value (which means the completeness of its object record).

Consistency

The data values in one data set are consistent with values in another.

Integrity

Intended references from data in one column of a table to data in another column of the same or different table is valid.

Timeliness

The time expectation for accessibility and availability of information

Uniqueness

No entity exists more than once within the data set

Conformity

To follow the set of standard data definitions like data type, size, and format

Validity

Within the defined value ranges

Steward 

Approver who can approve the data quality rule

Execution Status 

It gives the status of the rule, whether it has been executed successfully or any error occurred. It will display the following status to understand the rule execution.

  • Not Executed: The rule has not yet been executed and is in progress. 
  • Success: Rule has been successfully executed
  • Execution Failed: Execution of DQR rule has failed.
  • Failure: Rule has failed.

Last Run On

It displays the date and time on which the rule was most recently executed.

Created By

It displays the name of the user who created the rule.

Created Date

It displays the date and time on which the rule was created.

DQR main page features are as follows: 

DQR Features

Description

Add

To add a new rule

Addition Info Search (Filters)

To search for Addition Information

Reset  

To reset the data list sorted earlier. For example, if you have sorted the domain column, if you click on the reset button, the sorted data will revert to the default sequence.

Nine Dots

User actions from Nine Dots options:

  • Add a Tag
  • Remove a Tag
  • Delete Data Quality Rule

You perform the following tasks within the Data Quality Rule module:

  1. Create a new data quality rule.
  2. View the list of data objects on which the rule acts.
  3. Edit or set the threshold ranges or values for the selected rule type.
  4. Change rule status to DRAFT for editing the rule settings.
  5. Change the rule status to ACTIVE to finalize the rule settings.
  6. Copy the rule to a watchlist for quick monitoring.
  7. Set an alert or ticket system when the rule PASSES or FAILS.
  8. Schedule the execution of data quality rules at a preferred time.
  9. Delete Data Quality Rules.

Additionally, you can accomplish the following:

  1. Filter the data quality rules based on a Domain or rule type.
  2. Search for a Data Quality Rule based on Rule Name, purpose, or Steward.
  3. Sort the rules based on the rule Creation Date.

Creating a Data Quality Rule

To create a data quality rule.

  1. Go to the Governance Catalog > Data Quality >Data Quality Rule.
  2. Click + icon to create a new data quality rule. The data quality rule page is displayed.
  3. On the Add Rule page, enter the following:
    1. Rule Name - Name that identifies the rule to be written.
    2. Purpose - Description defining the need for this data quality rule.
  4. If the DQR is created with the details mentioned earlier, select the newly added rule (under the Rule Name column) from the home page.
  5. Edit a few or all to define the rule conditions.

    Note:

    • You only collect high-level details of DQR while adding Rules. Type of Rule, Rule object is collected within the detail page.
    • The rule editor can do any of the following when the rule is in a DRAFT state, as shown above under Status (top right corner). Edit the Purpose if more description is required under ‘Rule Level.’

    DQR User Actions

    Users can perform the following user action using nine dots options. 

    Delete Data Quality Rule

    You can remove the Data Quality Rules that are no longer needed.

    To delete a data quality rule,

    1. Select and click a data quality rule name.
    2. Click the Nine Dots and choose the Delete Rule option to delete the selected rule.
    3. The selected rule gets deleted.
    Add Tag

    You can add a tag in two ways.

    • Edit icon: You can click on the Edit icon under the Tag column that will display the add tag popup window. You can click on the +New button to add or update the tag.  
    • Nine Dots: Go to Nine dots, and choose to add tag option to add a new tag. will display the add tag popup window. You can click on the +New button to add or update the tag.
    Remove Tag

    You can remove the assigned tag on data objects in two ways.

    • Edit icon: You can click on the Edit icon under the Tag column that will display the Remove Tag popup window. You can click on – the remove button to remove a tag.
    • Nine dots: Go to Nine dots, choose to Remove tag option, and it will display the remove tag popup window. You can click on  – it to remove the assigned tag.  
    Run DQR Rule

    From the Nine Dots, select Run DQR Rule to execute a DQR.

    DQR Summary

    The Data Quality Rule Summary shows tasks that could be performed on a newly created rule. 
    DQRsummarypg-1

    DQR summary tab consists of the following features:

    Summary Page Features

    Description

    Purpose

    You may use the edit icon to modify the descriptive statement defining the need for the rule.

    Corrective Action

    Corrective measures can be provided to resolve the data quality issue on the objects. You can provide corrective measures to resolve the data quality issue on the objects. For example, the Table object has 1000 rows, and then we will run the rule with the function Total row count (TB) with a success range of 900-1100. The rule is successful because the total row is in the range. If the total number of rows is not in the range, the rule fails, and you can provide the corrective action of Alerts, Tickets, and descriptions. 

    Rule Level 

    It provides a list of functions to create a policy rule. Under ‘Rule Level’.

    Function 

    The following default Functions existing in the OvalEdge

    • SQL Functions: SQL EXACT VALUE, SQL VALUE CONTAINS, SQL VALUE RANGE
    • File Functions (FL) : FILE CREATED DATE(FL), FILE SIZE VALIDATION(FL), FILE FORMAT VALIDATION(FL), FILENAME CONTAINS(FL), FILE ROW VALIDATION(FL)
    • File Column Functions(FC) : 

    HAS_NOT_NULL, HAS_NO_DIGITS_OR_SPECIAL_CHARS, HAS_FIRST_AND_LAST_CHAR, HAS_SUBSTRING_IN_STRING, HAS_LANGUAGE, HAS_WORD_RANGE, HAS_EXACT_VALUE, HAS_REGEX_MATCH, HAS_COLUMN_VALUE_MATCH, HAS_COLUMN_MATCH_FIXED_VALUE, HAS_FIRST_UPPER_REST_LOWER, HAS_UNIQUE_VALUE,WORD_VALIDATION, NUMBER_VALIDATION, DATE_VALIDATION

    Success Percentage

    Some Functions take values in Range and some take in percentages. 

    • For example, the MAX function can be successful if the Max value of the column(order) is between 20000 to 25000.
    • Whereas the NULL COUNT function can be successful if the null count of a column(address) is between 2% to 5% of the total rows.

    Dimension

    • Functions are categorized into Dimensions for Reporting purposes. 
    • Each Function is associated with a dimension. You will have the ability to change it. 
    • After creating a Rule with a Function (having a Dimension), we calculate the Data Quality Index for each Data Object based on the number of Rules, successful and failed executions, no. of tickets reported, and the number of tickets resolved.
    • On the Data Quality index page, these Dimensions are used to show the completeness of an object against each Dimension.

    Execution Information

    Execution information provides the following information:

    • Last Status - Job-status, whether it is SUCCESS / FAILED
    • Last Run Time - It is the latest date and time of the job submitted
    • Rule Object - Graphical representation of the object about its object name, the status of the job, and last run date and time.

    Terms

    A business term can be assigned to the database object having the DQR rule applied.

    Tags

    A tag can be associated with the database object with the DQR rule applied.

    Additional Fields

    Additional fields are known as Custom Fields. Custom fields allow OvalEdge users to add more metadata content for each of the data objects in your Organization. OvalEdge provides a list of statistics or attributes by default, such as table count, row count, column count, total row count, Minimum value, Maximum value, Null count, distinct count, density, and null density. You can create additional fields called Custom fields if you want to add more information about the data objects.

    Tops Users 

    Top users are experts who interact mostly with the selected data object and can answer your question.

    You can view the Top Users in the top panel with the last seen Date & Time details. Click on the Top Users icon to get the complete list of all the users who have interacted with the selected data to object, along with the Last seen Date & Time. 

    Example: In a rule summary page, click the top user’s icon to view - All Users, Last seen Date, Views, Query Executions, and Comments.

    Alert

    An ALERT message can be set up in the rule editor of the data quality rule to alert the STEWARD of the rule if a failure occurs. You can select an option, YES, to notify alerts on the failure of rules. 

    Ticket

    A service Desk ticket is created to raise to report the data quality issue. You can select the option YES to raise the service desk request on the occurrence of any data quality issue. 

    Schedule

    You can schedule the Rule to run at a specified time. At the run time, it does the table profiling and executes the rule with the profile information.

    Note:

    • Select the checkbox next to the scheduled date to set a specific time for executing the rule. Choose the interval and time the data quality rule should be executed.
    • Example: The Data Quality rule is scheduled to run Daily at 05 hours and 00 MINUTES.
    • The Schedule can be edited only when the Data Quality Rule is in DRAFT status. To edit a Data quality rule, click the Nine Dots and change the status to DRAFT.

    Collaboration 

     The DQR issues can be notified using the collaboration message icon. 

    DQR User Actions   

    Users can perform the following user action using the DQR Nine Dots option available in DQR Summary.

    • Change status to active
      When the DQR is in DRAFT status, you can do the changes. Once the modifications are complete, select the Nine Dots to change the status to ACTIVE.
    • Update Steward
      From the Nine Dots, select Update Steward to update the default Steward of the DQR rule on the top right corner. You can assign different users based on your selection from the pop-up menu.
    • Add or Remove from my watch list
      DQR's watch list has the capability to notify you when a significant percentage of the data objects associated with DQR change.  Select the Watchlist option from the Nine Dots to add DQR to the watchlist notification. 
    • Delete Rules
      You can remove the Data Quality Rules that are no longer needed. Select the data quality rule name, go to Nine Dots, and choose to Delete the rule to delete the selected rule. The selected rules get deleted. 
    • Delete Associated object
      You can remove the association of the object. From the Nine Dots, select Delete Associated object to delete the contacts.
    • Run Now
      Run at any point in time, the rule will use the profiling information if the table has been profiled; otherwise, it will query the source system to execute the rule. Select the Run now option from the Nine Dots to execute the DQR. 
    • Delete Contacts
      You can remove the contact no longer needed. 
      From the Nine Dots, select Delete Contacts to delete the contacts. 

    Catalog Objects 

    The catalog objects tab displays information about a rule associated with a table object.  Data quality rules are default executed whenever the data object associated with the data rule is PROFILED. 
    catalogobject

    The catalog tab displays the following information:

    Catalog Objects

    Description

    Database

    Displays the name of the database 

    Schema

    Displays the name of the schema that belongs to the database

    Table

    Displays the name of the table that belongs to the schema

    Column

    Displays the name of the column that belongs to the schema

    Last Run on

    Displays the last run on the date and time rule executed 

    Last Status

    Displays the status of the rule that was executed recently. The execution status could be Success or Failure.

    History

    Displays the history of result execution of that associated rule to an object

    Fail Row Fetch Limit: An associate object has a column called Fail Row Fetch Limit, which allows setting the number of failed records fetched for a set of datasets. Using the setting, you can restrict the number of failing records for the rule you have applied or associated. By default, the maximum limit to fetch the failed records is 50. The fail row fetch limit is added based on the function name. Regex null count range email pattern is a unique column, and file column functions.

    Once the Object is associated with a DQR, you can perform the following activities. 
    • Change Status to Active: Once a DQR is associated with the object, you can change the status of the Draft DQR to active. It means a Rule should contain all the mandatory fields like rule name, purpose, function name, and its dimension, based on the function if it has success range/values (Inputs) and finally, the targeted objects(the objects associated with the rule). And the optional fields like alert, ticket, and scheduling (it should enable only DQ Rule in Active Status)
    • Remove From Mywatchlist: this feature allows you to remove the DQR from my watchlist.
    • Delete Rule: this feature allows you to delete the Rule associated with the data objects.
    • Delete Associated Objects: If no longer required, you can withdraw the DQR rule from the associated objects. From the Nine Dots icon, select Delete Associate Object. 
    • In Associate Object, select an associated object that needs to be removed. 
    • From the Nine Dots, select the Delete Associate Object, and a popup window message is displayed to confirm the deletion of the object. 
    • Run Now: To execute the DQR
    • Update Steward: You can assign or update the steward of a DQR 
    • Delete Contact: You can delete the contacts if no longer needed.

    Executions

    Execution tabs display the objects (Table/Table Column/File/File Column) level results with run ID, result, passed object count, and failed object counts.

    executionmainpage-1
    The Execution result tab displays the following information:

    Execution Tab

    Description

    Logs

    It displays the Logs of rules execution 

    Jobs Id

    It is the Id associated with a rule that is under execution. 

    Run Id

    It is a unique Id that shows the execution results for each run.

    Results

    Displays whether the result of the execution result of the rule is a success or failure.

    Passed object count

    If a Rule Function has Statistics based on the Rule Validation, then it will display the passed count of an object in the PassedObjectCount Column; or otherwise, it won't display any statistics for an object

    failed object counts

    If a Rule Function has Statistics based on the Rule Validation, then it will display the failed count of an object in the FailedObjectCount Column, or otherwise, it won't display any statistics for an object

    Total object count

    Displays the sum of the Passed & Failed count of the object. 

    Start time

    Displays the date & time on which the execution of rules started.

    End time

    Displays the date & time on which the execution of rules ended.

    Duration

    Displays the time taken for the execution of the rule. 

    Run by 

    Displays the user name who has run the rule. 

    Note: The execution result displays details depending on the object type. The execution result displays the above information if the object type is a File. In the case of a Table type object, the execution result will be displayed with the following details: Run ID, Database, Schema, Table, Last Run On, and Last Status (if any)

    Results 

    The results tab provides information on DQR rules that were associated with an object and executed. The executed rule is listed with a unique Run ID to evaluate the execution.

    resultdisplayed

    Alerts

    When a data quality rule fails, the rule editor can set an ALERT message to the STEWARD / Watcher of the Data Quality Rule to act. Also, suppose data FAILS to pass the quality rule. In that case, the system can report if you select the option to raise a service desk TICKET and send a notification message to the assigned STEWARD of the DAG at the schema level.

    Note: When a user is Steward & Watcher, if DQR has Alert and Tickets are enabled, only the Ticket will be raised.

    Tickets 

    Notifying users when the data quality rule fails

    When a data quality rule fails, the rule editor can set an ALERT message to the STEWARD / Watcher of the Data Quality Rule to act. Also, suppose data FAILS to pass the quality rule. In that case, the system can report if you select the option to raise a service desk TICKET and send a notification message to the assigned STEWARD of the DAG at the schema level.
    dqrticketalert

    Service Request

    Report Data Quality Issues through Data Catalog

    Data Catalog: In Data Catalog, If you want to raise any data quality issue at the table, column, file, or file column level, you can directly click on the Nine Dots icon and select ‘Request Data Quality Issue,’ which will route to ‘Data Quality Index’ page. 

    Report Data Quality Issues through Quality Index

    1. If you want to raise any data quality issue at the table, column, file, or file column level, navigate directly to Data Catalog > Summary page> ‘Report Problem,’ which will route to the ‘Data Quality Index’ page. 
    2. Click the Create Ticket button and display the New service request page.
    3. It will display a new service request page
    4. Enter details on the request page and click on the Save button to generate the service desk ticket.

    Contacts

    To add a Contact

    1. Navigate to the Governance Catalog > Data Quality > Data Quality Rule.
    2. You can view the individual Data Quality details by clicking a Data Quality Name, and it shows the Summary, Catalog Objects, Execution, Results, Service Request, and Contacts.
    3. Click on the Contacts tab. 
    4. Click on the +Add Contacts button, and select OE user.
      OE Users: OvalEdge Users.
      Existing External users: External users are available in the OvalEdge application.
      New External Users: This allows the addition of a new external user in the OvalEdge.
    5. Select the notification type as 
      1. Notify on Edit: The user would like to receive an update on the policy rule.
      2. Notify on Fail: The user would like to receive a notification when a policy rule fails.
      3. Notify on Passed: The user would like to receive a notification when a policy rule is passed. 
    6. Click on the +Add Contact button, OE user gets successfully added.  

    Note: A User can also manage the notification from the contact page.

    To Delete a Contact

    1. Navigate to the Governance Catalog > Data Quality > Data Quality Rules
    2. Click on the Contacts tab. 
    3. Select a contact. 
    4. Click on the nine dots and select the Delete Contacts option; a confirm window with the Yes or No option is displayed. Click on the Yes button, and the selected contact is deleted. 

    Copyright © 2019, OvalEdge LLC, Peachtree Corners GA USA