Overview
Data Quality is a process of regularly monitoring, controlling, verifying, and correcting the quality of data that comes into the systems. Typically, systems in an enterprise landscape will have accumulated large volumes of data, and getting insights into this quality involves significant effort and technical skills that are always at a premium.
Data Quality Rule is a set of processes written on a data object to monitor regularly, control, and verify data quality whenever data transformation happens.
Rules are applicable to the following data sources:
- DBMS Tables & Columns like RDBMS, ODBMS, NoSQL, etc.
- Structured data files like .csv, .xls, .xml, .json, pqt etc.
DQR Lifecycle
A data quality rule's lifecycle begins with the design and development process, followed by publication. The rules are scheduled here, the results are generated, alerts are notified, and finally, it completes the cycle by creating a service desk ticket to improve the quality of the rules.
The main components of the DQR Rule
- Function: It is the rule function that is executed on the data of the target data object.
- Target Data Object: The Table, Table Column, Files, and File column on which the rule is applied.
- The Rule type: is inferred from the function chosen for the rule.
- Custom Rules: Custom rules require user inputs for data validation based on interpretation values
- Boolean. It is a default value.
- Value. The rule query result should match the value specified for success.
- Range. The rule query result should be within the range specified.
Roles | Permission | Description |
Author & Analytical User |
Create Update Delete Execute |
Can create, update, delete and execute a Data Quality Rule |
Business User | Read-only | Cannot create a Data Quality Rule |
Rules Attributes:
Data Quality rules are set at the following three variable levels:
- Object Level: Object-level variables work at the Database tables or columns.
- Field Level: Field-level variables work at the Database Schema level, tables, or columns.
- Custom SQL: Custom SQL works for cataloged SQL from the query sheet.
Data Quality Rule main page
The Data Quality Rule main page displays the list of defined Data Quality Rules created on data objects.
DQR |
Description |
||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Rule Name |
Displays the name of the DQR rule |
||||||||||||||||||
Purpose |
Displays a brief description of the necessity of the rule |
||||||||||||||||||
Tags |
Displays the DQR associated with tags. The Tags assigned to the database object having the DQR rule applied. Click on the edit icon to edit, assign, or remove tags to the object. |
||||||||||||||||||
Status |
To alter the rule settings, you need to change the rule status to DRAFT. To complete the rule settings, you need to set the rule status to ACTIVE. |
||||||||||||||||||
Rule |
The factor/variable the rule can work |
||||||||||||||||||
Dimension |
Following are the various dimensions considered at the time of rule creation.
|
||||||||||||||||||
Steward |
Approver who can approve the data quality rule |
||||||||||||||||||
Execution Status |
It gives the status of the rule, whether it has been executed successfully or any error occurred. It will display the following status to understand the rule execution.
|
||||||||||||||||||
Last Run On |
It displays the date and time on which the rule was most recently executed. |
||||||||||||||||||
Created By |
It displays the name of the user who created the rule. |
||||||||||||||||||
Created Date |
It displays the date and time on which the rule was created. |
DQR main page features are as follows:
DQR Features |
Description |
Add |
To add a new rule |
Addition Info Search (Filters) |
To search for Addition Information |
Reset |
To reset the data list sorted earlier. For example, if you have sorted the domain column, if you click on the reset button, the sorted data will revert to the default sequence. |
Nine Dots |
User actions from Nine Dots options:
|
You perform the following tasks within the Data Quality Rule module:
- Create a new data quality rule.
- View the list of data objects on which the rule acts.
- Edit or set the threshold ranges or values for the selected rule type.
- Change rule status to DRAFT for editing the rule settings.
- Change the rule status to ACTIVE to finalize the rule settings.
- Copy the rule to a watchlist for quick monitoring.
- Set an alert or ticket system when the rule PASSES or FAILS.
- Schedule the execution of data quality rules at a preferred time.
- Delete Data Quality Rules.
Additionally, you can accomplish the following:
- Filter the data quality rules based on a Domain or rule type.
- Search for a Data Quality Rule based on Rule Name, purpose, or Steward.
- Sort the rules based on the rule Creation Date.
Creating a Data Quality Rule
To create a data quality rule.
- Go to the Governance Catalog > Data Quality >Data Quality Rule.
- Click + icon to create a new data quality rule. The data quality rule page is displayed.
- On the Add Rule page, enter the following:
- Rule Name - Name that identifies the rule to be written.
- Purpose - Description defining the need for this data quality rule.
- If the DQR is created with the details mentioned earlier, select the newly added rule (under the Rule Name column) from the home page.
- Edit a few or all to define the rule conditions.
Note:
- You only collect high-level details of DQR while adding Rules. Type of Rule, Rule object is collected within the detail page.
- The rule editor can do any of the following when the rule is in a DRAFT state, as shown above under Status (top right corner). Edit the Purpose if more description is required under ‘Rule Level.’
DQR User Actions
Users can perform the following user action using nine dots options.
Delete Data Quality Rule
You can remove the Data Quality Rules that are no longer needed.
To delete a data quality rule,
- Select and click a data quality rule name.
- Click the Nine Dots and choose the Delete Rule option to delete the selected rule.
- The selected rule gets deleted.
You can add a tag in two ways.
- Edit icon: You can click on the Edit icon under the Tag column that will display the add tag popup window. You can click on the +New button to add or update the tag.
- Nine Dots: Go to Nine dots, and choose to add tag option to add a new tag. will display the add tag popup window. You can click on the +New button to add or update the tag.
You can remove the assigned tag on data objects in two ways.
- Edit icon: You can click on the Edit icon under the Tag column that will display the Remove Tag popup window. You can click on – the remove button to remove a tag.
- Nine dots: Go to Nine dots, choose to Remove tag option, and it will display the remove tag popup window. You can click on – it to remove the assigned tag.
From the Nine Dots, select Run DQR Rule to execute a DQR.
DQR Summary
The Data Quality Rule Summary shows tasks that could be performed on a newly created rule.
DQR summary tab consists of the following features:
Summary Page Features |
Description |
---|---|
Purpose |
You may use the edit icon to modify the descriptive statement defining the need for the rule. |
Corrective Action |
Corrective measures can be provided to resolve the data quality issue on the objects. You can provide corrective measures to resolve the data quality issue on the objects. For example, the Table object has 1000 rows, and then we will run the rule with the function Total row count (TB) with a success range of 900-1100. The rule is successful because the total row is in the range. If the total number of rows is not in the range, the rule fails, and you can provide the corrective action of Alerts, Tickets, and descriptions. |
Rule Level |
It provides a list of functions to create a policy rule. Under ‘Rule Level’. |
Function |
The following default Functions existing in the OvalEdge
HAS_NOT_NULL, HAS_NO_DIGITS_OR_SPECIAL_CHARS, HAS_FIRST_AND_LAST_CHAR, HAS_SUBSTRING_IN_STRING, HAS_LANGUAGE, HAS_WORD_RANGE, HAS_EXACT_VALUE, HAS_REGEX_MATCH, HAS_COLUMN_VALUE_MATCH, HAS_COLUMN_MATCH_FIXED_VALUE, HAS_FIRST_UPPER_REST_LOWER, HAS_UNIQUE_VALUE,WORD_VALIDATION, NUMBER_VALIDATION, DATE_VALIDATION |
Success Percentage |
Some Functions take values in Range and some take in percentages.
|
Dimension |
|
Execution Information |
Execution information provides the following information:
|
Terms |
A business term can be assigned to the database object having the DQR rule applied. |
Tags |
A tag can be associated with the database object with the DQR rule applied. |
Additional Fields |
Additional fields are known as Custom Fields. Custom fields allow OvalEdge users to add more metadata content for each of the data objects in your Organization. OvalEdge provides a list of statistics or attributes by default, such as table count, row count, column count, total row count, Minimum value, Maximum value, Null count, distinct count, density, and null density. You can create additional fields called Custom fields if you want to add more information about the data objects. |
Tops Users |
Top users are experts who interact mostly with the selected data object and can answer your question. You can view the Top Users in the top panel with the last seen Date & Time details. Click on the Top Users icon to get the complete list of all the users who have interacted with the selected data to object, along with the Last seen Date & Time. Example: In a rule summary page, click the top user’s icon to view - All Users, Last seen Date, Views, Query Executions, and Comments. |
Alert |
An ALERT message can be set up in the rule editor of the data quality rule to alert the STEWARD of the rule if a failure occurs. You can select an option, YES, to notify alerts on the failure of rules. |
Ticket |
A service Desk ticket is created to raise to report the data quality issue. You can select the option YES to raise the service desk request on the occurrence of any data quality issue. |
Schedule |
You can schedule the Rule to run at a specified time. At the run time, it does the table profiling and executes the rule with the profile information. Note:
|
Collaboration |
The DQR issues can be notified using the collaboration message icon. |
DQR User Actions
Users can perform the following user action using the DQR Nine Dots option available in DQR Summary.
- Change status to active
When the DQR is in DRAFT status, you can do the changes. Once the modifications are complete, select the Nine Dots to change the status to ACTIVE. - Update Steward
From the Nine Dots, select Update Steward to update the default Steward of the DQR rule on the top right corner. You can assign different users based on your selection from the pop-up menu. - Add or Remove from my watch list
DQR's watch list has the capability to notify you when a significant percentage of the data objects associated with DQR change. Select the Watchlist option from the Nine Dots to add DQR to the watchlist notification. - Delete Rules
You can remove the Data Quality Rules that are no longer needed. Select the data quality rule name, go to Nine Dots, and choose to Delete the rule to delete the selected rule. The selected rules get deleted. - Delete Associated object
You can remove the association of the object. From the Nine Dots, select Delete Associated object to delete the contacts. - Run Now
Run at any point in time, the rule will use the profiling information if the table has been profiled; otherwise, it will query the source system to execute the rule. Select the Run now option from the Nine Dots to execute the DQR. - Delete Contacts
You can remove the contact no longer needed.
From the Nine Dots, select Delete Contacts to delete the contacts.
Catalog Objects
The catalog objects tab displays information about a rule associated with a table object. Data quality rules are default executed whenever the data object associated with the data rule is PROFILED.
Catalog Objects |
Description |
Database |
Displays the name of the database |
Schema |
Displays the name of the schema that belongs to the database |
Table |
Displays the name of the table that belongs to the schema |
Column |
Displays the name of the column that belongs to the schema |
Last Run on |
Displays the last run on the date and time rule executed |
Last Status |
Displays the status of the rule that was executed recently. The execution status could be Success or Failure. |
History |
Displays the history of result execution of that associated rule to an object |
Fail Row Fetch Limit: An associate object has a column called Fail Row Fetch Limit, which allows setting the number of failed records fetched for a set of datasets. Using the setting, you can restrict the number of failing records for the rule you have applied or associated. By default, the maximum limit to fetch the failed records is 50. The fail row fetch limit is added based on the function name. Regex null count range email pattern is a unique column, and file column functions.
Once the Object is associated with a DQR, you can perform the following activities.- Change Status to Active: Once a DQR is associated with the object, you can change the status of the Draft DQR to active. It means a Rule should contain all the mandatory fields like rule name, purpose, function name, and its dimension, based on the function if it has success range/values (Inputs) and finally, the targeted objects(the objects associated with the rule). And the optional fields like alert, ticket, and scheduling (it should enable only DQ Rule in Active Status)
- Remove From Mywatchlist: this feature allows you to remove the DQR from my watchlist.
- Delete Rule: this feature allows you to delete the Rule associated with the data objects.
- Delete Associated Objects: If no longer required, you can withdraw the DQR rule from the associated objects. From the Nine Dots icon, select Delete Associate Object.
- In Associate Object, select an associated object that needs to be removed.
- From the Nine Dots, select the Delete Associate Object, and a popup window message is displayed to confirm the deletion of the object.
- Run Now: To execute the DQR
- Update Steward: You can assign or update the steward of a DQR
- Delete Contact: You can delete the contacts if no longer needed.
Executions
Execution tabs display the objects (Table/Table Column/File/File Column) level results with run ID, result, passed object count, and failed object counts.
The Execution result tab displays the following information:
Execution Tab |
Description |
---|---|
Logs |
It displays the Logs of rules execution |
Jobs Id |
It is the Id associated with a rule that is under execution. |
Run Id |
It is a unique Id that shows the execution results for each run. |
Results |
Displays whether the result of the execution result of the rule is a success or failure. |
Passed object count |
If a Rule Function has Statistics based on the Rule Validation, then it will display the passed count of an object in the PassedObjectCount Column; or otherwise, it won't display any statistics for an object |
failed object counts |
If a Rule Function has Statistics based on the Rule Validation, then it will display the failed count of an object in the FailedObjectCount Column, or otherwise, it won't display any statistics for an object |
Total object count |
Displays the sum of the Passed & Failed count of the object. |
Start time |
Displays the date & time on which the execution of rules started. |
End time |
Displays the date & time on which the execution of rules ended. |
Duration |
Displays the time taken for the execution of the rule. |
Run by |
Displays the user name who has run the rule. |
Note: The execution result displays details depending on the object type. The execution result displays the above information if the object type is a File. In the case of a Table type object, the execution result will be displayed with the following details: Run ID, Database, Schema, Table, Last Run On, and Last Status (if any)
Results
The results tab provides information on DQR rules that were associated with an object and executed. The executed rule is listed with a unique Run ID to evaluate the execution.
Alerts
When a data quality rule fails, the rule editor can set an ALERT message to the STEWARD / Watcher of the Data Quality Rule to act. Also, suppose data FAILS to pass the quality rule. In that case, the system can report if you select the option to raise a service desk TICKET and send a notification message to the assigned STEWARD of the DAG at the schema level.
Note: When a user is Steward & Watcher, if DQR has Alert and Tickets are enabled, only the Ticket will be raised.
Tickets
Notifying users when the data quality rule fails
When a data quality rule fails, the rule editor can set an ALERT message to the STEWARD / Watcher of the Data Quality Rule to act. Also, suppose data FAILS to pass the quality rule. In that case, the system can report if you select the option to raise a service desk TICKET and send a notification message to the assigned STEWARD of the DAG at the schema level.
Service Request
Report Data Quality Issues through Data Catalog
Data Catalog: In Data Catalog, If you want to raise any data quality issue at the table, column, file, or file column level, you can directly click on the Nine Dots icon and select ‘Request Data Quality Issue,’ which will route to ‘Data Quality Index’ page.
Report Data Quality Issues through Quality Index
- If you want to raise any data quality issue at the table, column, file, or file column level, navigate directly to Data Catalog > Summary page> ‘Report Problem,’ which will route to the ‘Data Quality Index’ page.
- Click the Create Ticket button and display the New service request page.
- It will display a new service request page
- Enter details on the request page and click on the Save button to generate the service desk ticket.
Contacts
To add a Contact
- Navigate to the Governance Catalog > Data Quality > Data Quality Rule.
- You can view the individual Data Quality details by clicking a Data Quality Name, and it shows the Summary, Catalog Objects, Execution, Results, Service Request, and Contacts.
- Click on the Contacts tab.
- Click on the +Add Contacts button, and select OE user.
OE Users: OvalEdge Users.
Existing External users: External users are available in the OvalEdge application.
New External Users: This allows the addition of a new external user in the OvalEdge. - Select the notification type as
- Notify on Edit: The user would like to receive an update on the policy rule.
- Notify on Fail: The user would like to receive a notification when a policy rule fails.
- Notify on Passed: The user would like to receive a notification when a policy rule is passed.
- Click on the +Add Contact button, OE user gets successfully added.
Note: A User can also manage the notification from the contact page.
To Delete a Contact
- Navigate to the Governance Catalog > Data Quality > Data Quality Rules
- Click on the Contacts tab.
- Select a contact.
- Click on the nine dots and select the Delete Contacts option; a confirm window with the Yes or No option is displayed. Click on the Yes button, and the selected contact is deleted.
Copyright © 2019, OvalEdge LLC, Peachtree Corners GA USA