Deep Dive Articles

Certifications

Certification of data catalog objects indicates the reliability of the metadata that has been crawled and curated. It, along with other indicators such as Data Quality Score, Endorsement Rating, Popularity Score, etc., fosters users' trust and confidence, enabling them to use the data with confidence for various analytical purposes.

Certification offers clear labels indicating the status of each object, providing transparency regarding its quality and reliability. A “certified” label denotes completeness and reliability, signifying suitability for analytical purposes. Conversely, objects with inconsistencies or redundancies are labeled as "violated" or "inactive," cautioning users to approach them carefully or disregard them entirely. These indicators foster trust and confidence among users.

Certification of data catalog objects can be done by the Governance Owners of the objects and Administrators, either manually or through automated certification policies. 

0

Certification Labels

There are five different Certification Labels supported, which are listed below:

  1. Certified
  2. Cautioned
  3. Violated
  4. Inactive
  5. None

Certification Status

Icon

Description 

Example

Certified

1

 

This certification indicates that the data object is reliable, accurate, and meets predefined quality standards. 

A certified sales report ensures that the figures are accurate and can be trusted for decision-making.

Cautioned

2

 

This certification indicates the data object may have minor issues or potential data quality concerns. Data objects are usable, but should exercise additional scrutiny when using them.

For instance, a ‘customer details’ table with some duplicate entries might carry a cautioned icon, reminding users to review the data for accuracy before relying on it.

Violated

3

 

This certification indicates that a data object has significant quality issues. The data objects should be used with caution as they may contain misleading or incorrect information.

An example of this type of certification could be a financial statement with discrepancies that compromise its accuracy and integrity.

Inactive 

4

 

This certification indicates that the data object is currently not in use or obsolete.

An example of this type of certification can be any table that is no longer in use.

None

5

 

This certification means that the data object has not undergone certification or quality assessment.

This certification is typically used for newly added data objects or those that governance stakeholders have not yet evaluated.



The data objects that can be certified are:

  1. Tables
  2. Table Columns
  3. Files
  4. File Columns
  5. Reports
  6. Report Columns
  7. API
  8. API AttributesAPIs
  9. API Attributes

Certifying Objects

Data Objects can be certified in two ways: 

  • Manual Certification: Selected objects are certified in bulk or individually.
  • Automatic Certification through Certification Policies: These can be defined to automate the certification of objects based on different rules.

Authorized users, who can apply manual certifications, are:

  • Governance stakeholders. Any one of the 6 governance roles of an object
  • Admin

Authorized users, who can apply automated certifications, are:

  • Admin

Users with Meta-Write and Meta-Read privileges can view the certifications applied, but cannot modify or assign new ones.

Manual Certification

Manual certification can be applied to data objects from two different places:

  1. Data Catalog List View
  2. Object Summary page 

Certifications can only be applied by the Governance Role users of the object or OE_Admin users.


Applying Certification from the Data Catalog List View 

  • Certification labels can be applied to multiple data objects simultaneously in bulk using the 9-dots options > Change Certification Type option. 
    6

Applying Certification from the Object Summary Page

  • Certification labels can be applied to a data object using 9-dots options > Apply Certification option.
    7 
  • The Certification History showcases the modifications made to the certification labels on the object over time. Clicking the certification label opens these changes for viewing.
    8

Caution Downstream Objects

A CAUTION label of an object can be propagated to its downstream objects through lineage relationships, if it has been built. This will override the certification label directly applied to the object until the propagated label is removed.

This informs users about the potential issues or concerns associated with the data object. 

9

  • The caution applies across all levels of the lineage.
  • Downstream objects are objects displayed on the right side of the data flow in a  lineage,  the destination objects, showing how the data is ultimately used in reports, dashboards, or critical analyses.
    • Cautioned downstream objects' certification label cannot be changed till the caution propagation is removed from the cautioned origin object.
    • The cautioned origin object certification label cannot be changed till the propagated caution label is removed from the downstream objects. 
    • When the cautioned label is removed for downstream objects their certification label will be changed to the previously applied Certification Status.

Cautioning Downstream Objects

  • When certifying an object as “Cautioned”, the option “Caution Downstream Objects' can be selected to propagate the cautioned label to downstream objects in the lineage.
    10
  • Also, within the object summary page, 9-dots > Process upstream or downstream objects > Caution all downstream objects" radio button can be used.
    11

Automatic Certification

Automatic certification refers to the process of certifying data objects based on predefined criteria or rules without requiring manual intervention. This process typically involves setting up certification policies or rules that automatically evaluate data objects against specific conditions, such as the presence of certain attributes or meeting predefined thresholds. When a data object meets the criteria outlined in the certification policy, it is automatically certified without the need for human intervention. 

This automated certification system is customized to assign the "Certified" status and does not support statuses such as "Violated," "Cautioned," or "Inactive”.


OE_ADMIN can automate the certification from the Certification Policy module. The Certification Policy list view page displays existing policies, and users can create new policies via the "Add Policy" button.
12


The table below displays the Certification Policy List  View page attributes: 

Certification Policy List View Details

Description

Domain 

Displays the domain name

Name

Displays the user-defined name of the certification policy.

Type

Displays the certification type. Currently, the certification policy supports "certified".

Status

Indicates whether the policy is active or inactive. The policy must be active for the algorithm to run in the background and work on certifying data objects.

Certified Date

Displays the date on which the rule is run.

Policy Description

Provides information about the purpose of creating the policy, typically noted by users.

Created By

Displays the name of the user who has created the policy.


Adding New Policies

OE-ADMIN users can edit or create certification policies using the "Add Policy" button. When adding a policy, the user must select a Domain from the drop-down menu and provide a name for the policy. Because Automatic Certifications can apply only "Certified" certifications to objects, the Policy Type field is grayed out by default and displays 'Certified'. 

13

Certification Policy Summary Page

Once a policy is created, a new line item is displayed in the List View page. Clicking on the Policy name navigates the user to the detailed Policy Summary Page.

The newly created policy is initially saved in draft status. Users can quickly edit specific fields by hovering over them, revealing a pencil icon for swift modifications. OE_ADMIN has the authority to edit the changes, while Meta-Write users and Meta-Read users are restricted to viewing the policy details (no ability to make edits).

14

Policy Title: The title showcases the policy name, an editable field.

Policy Description: Displays the policy description, outlining the purpose of certifying this policy and relevant details.

Policy Execution Details: It provides users with a comprehensive configuration interface for specifying how a certification policy should be executed. This section allows users to define the following aspects:

  • Policy Type: "Certified" is automatically displayed by default as the certification type since it is the only supported certification type. This option is grayed out, indicating that "Certified" is the only applicable certification type.
  • Trigger Type: The checkbox “On Update”, when selected, automatically triggers the policy rule to be run whenever any update is made on the selected data object’s Business Glossary Terms, Business Description, or Lineage. Uncheck the box to run the policy manually each time, enabling certification of data objects as needed.
  • Trigger Objects: Users can specify the data objects (Table, File, Report, APIs) on which the certification policy should be executed. The selection is limited to one object type at a time, and a policy can only be run for one object type at a time. If "Table" is chosen, the policy executes on all tables cataloged in the application when the policy rule is run.
  • Schedule: Once the policy execution details are configured, users can use scheduling to set recurring times for the policy to run in the future—whether on a yearly, monthly, weekly, daily, or hourly basis. This automated scheduling eliminates the need for manual intervention, allowing seamless rule execution.

It's important to note that scheduling functionality is triggered only when the policy is in an active state. 

Policy Rules

Policy rules define the logic for applying automatic certification.  The four policy rule types are:

  1. Business Description present
  2. Glossary term present
  3. Lineage present
  4. Data Quality score

Business Description Present

Ensures that objects are certified only when they have a Business Description. The specified percentage determines the minimum proportion of columns within an object that must have Business Descriptions defined for certification.  

The specified percentage determines the minimum proportion of columns within an object that must have Business Descriptions defined to get certified. For instance, if the set percentage is 50, certification will be granted only to an object with a Business Description, and at least 50% of its columns must have associated Business Descriptions.

The certification process requires that the main data object (Table) must have a business description. Failure to meet this criterion will result in the data object not being certified, even if the columns exceed the specified threshold.

15

Glossary Term Present: 

Certification based on the association of Business Glossary Terms with data objects. The defined percentage sets the threshold for the minimum proportion of an object's columns that must be linked with Glossary Terms for certification. For example, if the specified percentage is 50, then objects with at least 50% of their columns linked with Business Glossary Terms would qualify for certification.

The certification process requires the main data object to have a glossary term associated; failure to meet this criterion will result in the data objects not being certified, even if the columns surpass the specified threshold.

16

Lineage Present:

Establishes certification criteria based on an object's lineage, particularly its upstream connections. The specified minimum upstream level ensures that only objects with a predefined number of upstream levels in lineage qualify for certification. For example, if the minimum upstream level is set to 5, objects with 5 upstream levels in lineage will qualify for certification.

The certification process requires the main data object to have an Upstream lineage; failure to meet this criterion will result in the data object not being certified, even if its columns exceed the specified threshold.

17

Data Quality Score

Certifying data objects involves two essential criteria: DQ Score, focusing on meeting a specified DQ Score threshold, and the Percentage, determining the minimum proportion of an object's columns that must meet the specified DQ Score threshold for certification.

18

Imagine a Table with 100 columns, if DQ Score Threshold Criteria is set to 80, and the percentage criteria are set to 70, then at least 70% i.e., 70 or more columns within the table must individually achieve a DQ score of 80 or above to qualify for certification. This dual-criteria approach ensures a comprehensive assessment of both overall dataset quality and the quality of individual data attributes.

The certification process requires the main data object to surpass the DQR score specified; failure to meet this criterion will result in the data object not being certified, even if the columns surpass the specified threshold.

Scenario-Based Example

For a single policy, multiple policy rules can be established and executed on the selected data objects. Consider a scenario where a certification policy is designed to certify tables cataloged in the application. Various rules can be created based on specific requirements. 


For instance:

Rule 1: Certify tables only when the business description is present for at least 50% of the tables.

Rule 2: Certify tables when a glossary term is present for at least 70% of the Tables.

In the background, the algorithm assesses all the columns within the table to check their associations with the specified criteria. For example, if a table has 10 columns and only 4 of them have an associated business description (less than 50%), certification will not be applied. However, if the same table has terms assigned to 8 out of 10 columns (meeting the 70% requirement), the table will be certified. This approach ensures a clear and concise way to apply certification based on specified conditions.

A rule can be set only once, and it is not possible to create a new rule if the rule already exists.

The created rules are displayed in the Policy Rules section at the bottom, presenting information such as Rule Name, Rule Description, Percentage/Upstream Level, DQ Score, Creator, and Modification Date. The checkboxes are provided for the rules created for the users to perform actions such as changing status to active/inactive, running the role, deleting policy rules, or the entire policy.

Actions on Policy

The rules created are displayed as separate line items in the Policy Rules section at the bottom, featuring details like Rule Name, Rule Description, Percentage/Upstream Level, DQ Score, Creator, and Modification Date. Checkboxes are available for users to execute actions such as changing the status to active/inactive, running the rule, deleting policy rules, or the entire policy using the 9-dots options.

Change Status Active - Draft

  • The initially created Certification Policy will be saved as draft.
  •  Certification policies in an active status cannot be edited; they must be in draft status for any modifications. 
  • A rule cannot be executed if the policy is in draft status; it must be in an active state. It's crucial to emphasize that the scheduling functionality is only triggered when the policy is in active state.

Run the Policy

Once a policy has been created and modified in its Draft state, use the 9-dot icon to “Change Status to Active”. In this state, the policy itself cannot be edited except for its Schedule. 

The policy can only be executed when it is Active.

When the policy is run, objects that meet the criteria set forth in the policy rules will be certified. Multiple policy rules can be selected and run simultaneously. If a policy incorporates two or more rules, an object must fulfill all the specified rules to achieve certification.

Delete Policy Rules

Policy rules can be deleted only when the selected certification policy is in the Draft state. Once a policy rule is deleted, it disappears from the list of Policy rules.

Delete Policy

A policy can be deleted only when it is in the Draft state. 

Associated Data

When a policy is run and the objects are certified based on the policy rules, the associated objects are displayed on the Associated Data page of the Policy rule. The associated data is split into four object types: Tables, Files, Reports, and APIs.

19

For Tables, the List view displays the following attributes:

  1. Database
  2. Schema
  3. Table
  4. Title
  5. Business Description
  6. Row count

For Files, the List view displays the following attributes:

  1. Type
  2. File Name
  3. File Location
  4. Business Description
  5. Created Date
  6. Changed on
  7. Popularity

For Reports, the List view displays the following attributes:

  1. Report Group
  2. Report Name
  3. Title
  4. Type
  5. Business Description

Copyright © 2024, OvalEdge LLC, Peachtree Corners, GA, USA