Deep Dive Articles

Certifications - A Deep Dive

Overview

Certifying a data object involves an evaluation process conducted by Governance Stakeholders in an organization such as stewards, and owners, who possess expertise in and around the data object. Additionally, OE_ADMIN can also certify data objects.

This evaluation aims to determine whether the data object is business-ready for the consumers and to instill confidence by assigning certification statuses such as certified, violated, cautioned, Inactive, or none to the data objects. The process of using certifications helps to manage risks associated with data handling, promotes consistency in use across various departments, and supports operational efficiency by reducing the likelihood of errors. Consequently, certified data can uphold the integrity of business analytics and intelligence, fostering better customer relationships and strategic business outcomes.

Various Certification Statuses

Here are the different types of certifications available in OvalEdge. Detailed explanations are provided in the following table

  1. Certified
  2. Cautioned
  3. Violated
  4. Inactive
  5. None

Certification Status

Icon

Description 

Example

Certified

This certification indicates that the data object is reliable, accurate, and meets predefined quality standards. 

A sales report that has been certified ensures that the figures are accurate and can be trusted for decision-making.

Cautioned

This certification indicates the data object may have minor issues or potential data quality concerns. Data objects are usable but should exercise additional scrutiny when using them.

For instance, a ‘customer details’ table with some duplicate entries might carry a cautioned icon, reminding users to review the data for accuracy before relying on it.

Violated

This certification indicates that a data object has significant quality issues. The data objects should be used with caution as they may contain misleading or incorrect information.

An example of this type of certification could be a financial statement with discrepancies that compromise its accuracy and integrity.

Inactive 

This certification indicates that the data object is currently not in use or obsolete.

An example of this type of certification can be any table that is no more in use.

None

This certification means that the data object has not undergone certification or been assessed for its quality.

This certification is typically used for newly added data objects or those that have not yet been evaluated by governance stakeholders.


The data objects that can be certified are:

  1. Tables
  2. Table Columns
  3. Files
  4. File Columns
  5. Reports
  6. Report Columns

Certifying Objects

Data Objects can be certified in two ways

Manual Certification involves certifying a data object by selecting and choosing the relevant option from the 9-dots action menu. 

  • Stakeholders involved in governance, including Owners, Stewards, Custodians, and additional roles (configured as 4, 5, 6), can manually assign certifications to specific data objects. Additionally, OE_ADMIN holds the authority to certify data objects.

Automatic Certification streamlines the certification process by applying certification to data objects without the need for manual navigation, thereby minimizing significant manual effort and reducing the risk of human error. 

  • OE_ADMIN can create certification policies and execute them on data objects, automating the certification process for selected data objects.

Note: Users with Meta-Write and Meta-Read privileges are limited to viewing the certifications applied without the ability to modify or assign new certifications.

Manual Certification

Manual certification can be applied to data objects from two different places:

  1. Data Catalog List View page
  2. Object Summary page

Applying Certification from the Data Catalog List View page

  • The List View within the Data Catalog offers users the flexibility to update certification statuses for multiple data objects in bulk. Users can achieve this by selecting the appropriate data object tab, checking the desired data objects checkboxes, and then choosing the "Apply Certification" option from the 9-dots action menu.

  • Additionally, in the List View, hovering over the corresponding data object's line item reveals the edit icon. Clicking on this icon enables users to apply their preferred certification on a selected data object.

Applying Certification from the Object Summary Page

On the Object Summary page, certifications can be assigned by clicking on the 9-dot action icon. After selecting the “Apply Certification” option, a popup appears where users can choose the appropriate certification option, provide comments, and apply the certification.

The applied certification details can be reviewed using the icon on the data object summary page. Viewers with Meta-Read/Meta-Write access can click on the certification icon to view the history of certifications applied to the data object, offering transparency and context regarding its certification status over time.

Caution Downstream Objects

If a data object has a caution status applied as a certification, users can caution all the downstream objects associated with the data object by using the 9-dots icon to inform users of issues or sensitive data. They need to select "Process upstream or downstream objects" and choose the "Caution all downstream objects" radio button. Caution downstream can also be applied while Certifying an object as Cautioned. In that case, the user has to select the box - Caution all Downstream objects. Downstream objects will then show a Caution icon with a lineage icon above it, indicating that the Caution was applied downstream from a root object.


To remove caution certification from downstream objects, the "Remove Caution all the downstream objects" option should be applied from the 9-dots menu on the root object's page. Until “Remove Caution from all the downstream objects” is applied from the root  object, the user cannot change the certification type for the root object as well as the downstream objects.


Automatic Certification

Multiple Certification policies can be created with each policy configured with various policy rules. These policy rules when executed on data objects certify data objects based on specific criteria, including the presence of either a business description, Data Quality Index, Glossary term, or lineage information on data objects, all subject to the user's predefined threshold. Certification is then automatically applied to tables, files, and reports, providing a straightforward and automated method for data certification. This automated certification is tailored to apply the "Certify" designation to various data objects like APIs, Tables, Files, and Reports.

OE_ADMIN can automate the certification process by navigating to the Governance Catalog from the left panel menu and selecting Certification Policy. The Certification Policy list view page presents existing policies, and users can create or add new policies through the "Add Policy" button.



The table below displays the Certification Policy List  View page attributes: 

Certification Policy List View Details

Description

Domain 

Displays the domain name

Name

Displays the user-defined name of the certification policy.

Type

Displays the certification type. Currently, the certification policy supports "certified".

Status

Indicates whether the policy is active or inactive. The policy must be active for the algorithm to run in the background and work on certifying data objects.

Certified Date

Displays the date on which the rule is run.

Policy Description

Provides information about the purpose of creating the policy, typically noted by users.

Created By

Displays the name of the user who has created the policy.

 

Adding New Policies

OE-ADMIN users can edit or create certification policies using the "Add Policy" button. The user must select a Domain from the drop-down menu and provide a Name for the policy when adding it. As Automatic Certifications can only apply "Certified" certifications to objects, the Policy Type field is grayed out by default and displays 'Certified'. 

Certification Policy Summary Page

Once created, a new line item is created and displayed for the certification policy created within the main Certification Policy List View page. Users can easily navigate to the detailed Policy Summary Page by clicking on the policy name. On this dedicated page, OE_ADMIN users have the flexibility to configure policy execution details and rules.

The newly created policy is initially saved in draft status. Users can quickly edit specific fields by hovering over them, revealing a pencil icon for swift modifications. OE_ADMIN has the authority to edit the changes, while Meta-Write users and Meta-Read users are restricted to viewing the policy details (no ability to make edits).

Policy Title: The title showcases the policy name, an editable field.

Policy Description: Displays the policy description, outlining the purpose of certifying this policy and relevant details.

Policy Execution Details: It provides users with a comprehensive configuration interface for specifying how a certification policy should be executed. This section allows users to define the following aspects:

  • Policy Type: "Certified" is automatically displayed by default as the certification type since it is the only supported certification type. This option is grayed out, indicating the exclusivity of "Certified" as the applicable certification type.
  • Trigger Type: The checkbox “On Update” when selected automatically triggers the policy rule to be run whenever  there any update is made on the selected data object’s Business Glossary Terms, Business Description, or Lineage. Uncheck the box to manually run the policy every time, enabling the application of certification to data objects as needed.
  • Trigger Objects: Users can specify the data objects (Table, File, Report, APIs) on which the certification policy should be executed. The selection is limited to one object type at a time and a policy can only be run for one object type at a time. If "Table" is chosen, the policy executes on all tables cataloged in the application when policy rule is run.
  • Schedule: Once the policy execution details are configured, users can utilize scheduling to establish recurring times for the policy to run in the future—whether it's on a yearly, monthly, weekly, daily, or hourly basis. This automated scheduling eliminates the need for manual intervention, allowing seamless rule execution.

Note: It's important to note that scheduling functionality is triggered only when the policy is in an active state. 

 

Policy Rules

Policy rules define the logic behind the application of automatic certification.  The four policy rule types are:

  1. Business Description present
  2. Glossary term present
  3. Lineage present
  4. Data Quality score

Business Description Present: Ensures that objects are certified only when they possess a Business Description. The specified percentage determines the minimum proportion of columns within an object that must have associated Business Descriptions for certification.  For example, if the set percentage is 50, certification will be granted to an object with a Business Description only if at least 50% of its columns have Business Descriptions. 

Note: The certification process requires the data object to have a business description; failure to meet this criterion will result in the data object not being certified, even if the columns surpass the specified threshold.


Glossary Term Present: Certifies objects based on the presence of Business Glossary Terms. The defined percentage sets the threshold for the minimum proportion of an object's columns that must be glossary term association for certification. For instance, if the specified percentage is 50, then objects that have at least 50% of their columns are linked with Business Glossary Terms would qualify for certification.

Note: The certification process requires the data object to have a glossary term associated; failure to meet this criterion will result in the data object not being certified, even if the columns surpass the specified threshold.



Lineage Present: Establishes certification criteria based on an object's lineage, particularly its upstream connections. The specified minimum upstream level ensures that only objects with a predefined number of upstream levels in lineage qualify for certification. For example, if the minimum upstream level is set to 5, objects with 5 upstream levels in lineage will qualify for certification.

Note: The certification process requires the data object to have a Upstream lineage; failure to meet this criterion will result in the data object not being certified, even if the columns surpass the specified threshold.


Data Quality Score: 

Certifying data objects involves two essential criteria: DQ Score focusing on meeting a specified DQ Score threshold and the Percentage determining the minimum proportion of an object's columns that must meet the specified DQ Score threshold for certification.

Imagine a Table with 100 columns, if DQ Score Threshold Criteria is set to 80, and the percentage criteria are set to 70, then at least 70% i.e., 70 or more columns within the table must individually achieve a DQ score of 80 or above to qualify for certification. This dual-criteria approach ensures a comprehensive assessment of both overall dataset quality and the quality of individual data attributes.

Note: The certification process requires the data object to surpass the DQR score specified; failure to meet this criterion will result in the data object not being certified, even if the columns surpass the specified threshold.

Scenario-Based Example

For a single policy, multiple policy rules can be established and executed on the selected data objects. Consider a scenario where a certification policy is designed to certify tables cataloged in the application. Various rules can be created based on specific requirements. 

For instance:

Rule 1: Certify tables only when the business description is present for at least 50% of the tables.

Rule 2: Certify tables when a glossary term is present for at least 70% of the Tables.

In the background, the algorithm assesses all the columns within the table to check their associations with the specified criteria. For example, if a table has 10 columns and only 4 of them have an associated business description (less than 50%), certification will not be applied. However, if the same table has terms assigned to 8 out of 10 columns (meeting the 70% requirement), the table will be certified. This approach ensures a clear and concise way to apply certification based on specified conditions.

A rule can be set only once, and it is not possible to create a new rule if the rule already exists.

The created rules are displayed in the Policy Rules section in the bottom, presenting information such as Rule Name, Rule Description, Percentage/Upstream Level, DQ Score, Creator, and Modification Date. The checkboxes are provided for the rules created for the users to perform actions such as Changing status to active/inactive, running the role, deleting policy rules, or the entire policy.

 

Actions on Policy

The rules created are displayed as separate line items in the Policy Rules section at the bottom, featuring details like Rule Name, Rule Description, Percentage/Upstream Level, DQ Score, Creator, and Modification Date. Checkboxes are available for users to execute actions such as changing the status to active/inactive, running the rule, deleting policy rules, or the entire policy using the 9-dots options.

Change Status Active - Draft

  • The initially created Certification Policy will be saved as draft.
  •  Certification policies in an active status cannot be edited; they must be in draft status for any modifications. 
  • A rule cannot be executed if the policy is in draft status; it must be in an active state. It's crucial to emphasize that the scheduling functionality is only triggered when the policy is in active state.

Run the Policy

Once a policy has been created and modified in its Draft state, use the 9-dot icon to “Change Status to Active”. In this state, the policy itself cannot be edited except for its Schedule. 

Note: the policy can only be executed when it is Active. 

When the policy is run, objects that meet the criteria set forth in the policy rules will be certified. Multiple policy rules can be selected and run simultaneously. If a policy incorporates two or more rules, an object must fulfill all the specified rules to achieve certification.

Delete Policy Rules

Policy rules can be deleted only when the selected certification policy is in the Draft state. Once a policy rule is deleted, it disappears from the list of Policy rules.

Delete Policy

A policy can be deleted only when it is in the Draft state. 

Associated Data

When a policy is run and the objects are certified based on the policy rules, the associated objects are displayed on the Associated Data page of the Policy rule. The associated data is split into 3 object types: Tables, Files, and Reports.


For Tables, the List view displays the below-mentioned attributes:

  1. Database
  2. Schema
  3. Table
  4. Title
  5. Business Description
  6. Row count

For Files, the List view displays the below-mentioned attributes:

  1. Type
  2. File Name
  3. File Location
  4. Business Description
  5. Created Date
  6. Changed on
  7. Popularity

For Reports, the List view displays the below-mentioned attributes:

  1. Report Group
  2. Report Name
  3. Title
  4. Type
  5. Business Description