Deep Dive Articles

Data Catalog - A Deep Dive

Leveraging Data Catalog for Effective Data Management

OvalEdge offers a comprehensive data governance solution, which includes an advanced Data Catalog and self-service tools, enabling users to monitor data behavior for analytics, where it comes from, and what other data it is related to.

The catalog crawls all the metadata of an organization by connecting to various source systems such as transactional databases, NoSQL databases, and data warehouses. The Data Catalog stores the metadata from the multiple source systems into a single source of reference. The connection established is continuously kept live to retrieve new updates on the metadata and data on a scheduled basis. 

The OvalEdge product catalogs metadata, which is data about data (Table Name, Source Description, etc) into a consolidated unified view of metadata. The metadata thus cataloged can be curated (added, organized, and managed) in order to extract maximum value out of the data. The Data Catalog is designed to help organizations consume relevant information quickly and efficiently, saving them valuable time and resources. 

  • Centralized Repository: OvalEdge's Data Catalog is a centralized repository, consolidating metadata of data objects from various sources into a unified platform, facilitating efficient data management and accessibility.
  • Metadata Updates and Version Control: The Data Catalog enables metadata updates and version control, ensuring that users can access the most recent and accurate metadata, eliminating confusion regarding metadata versions and updates.
  • Clear Data Ownership and Lineage Tracking: OvalEdge's Data Catalog provides clear visibility into data ownership and comprehensive data lineage, enabling users to trace the origins and transformations of data, fostering data transparency and accountability.
  • Data Security and Compliance Measures: OvalEdge's Data Catalog implements data security measures and compliance standards, ensuring that sensitive data remains protected and that regulatory requirements are met, mitigating the risk of data breaches and compliance violations.
  • Efficient Collaboration and Communication: The Data Catalog's collaborative features enable streamlined communication and collaboration among stakeholders, fostering an environment of shared insights and knowledge, promoting informed decision-making, and effective data utilization.
  • Analytics and Insights: OvalEdge's Data Catalog provides various analytics and insights, enabling users to derive valuable data-driven insights and make informed decisions based on comprehensive data analysis and visualization.

By addressing these critical aspects, OvalEdge's Data Catalog empowers organizations to effectively manage their data objects, optimize data usage, and drive informed decision-making, thereby enhancing overall operational efficiency and productivity.

Building Data Catalog 

OvalEdge's Data Catalog allows the integration of various data sources using OvalEdge connectors and streamlines the process of crawling metadata and the profiling of data objects. 

  • Connector Integration and Setup: OvalEdge's Data Catalog enables seamless integration with a wide range of connectors, allowing businesses to connect various data sources, and parse the metadata from a range of data objects like Tables, Files Reports, and APIs.
  • Automated Crawling of Data objects: The Connectors automate the data crawling process, effectively retrieving metadata from the source systems. This ensures that all relevant data objects are methodically collected and stored within the Data Catalog.
  • Comprehensive Data Profiling Capabilities: OvalEdge's Data Catalog offers data profiling capabilities, allowing users to acquire a thorough understanding of the attributes and quality of their data objects. This process supports well-informed decision-making and data-centric analytics, providing crucial insights such as Row Count, Column Count, Density, Null Density, and various other relevant information.

The image below displays the customizable settings for data crawling and profiling in Connectors page of Administration, allowing businesses to set up the process based on specific requirements and preferences, ensuring that the Catalog reflects the unique data landscape of the organization.

Navigating Across the Data Catalog

In this segment, we will explore the range of data objects available, the different navigation methods within the OvalEdge Data Catalog, and the diverse features and functionalities incorporated to enhance data governance.

Data Objects

The crawled metadata information from the remote data sources is intelligently stored in separate tabs across the top of the main page of the Data Catalog.  The data objects include:

Unified List View of Data Objects 

In a centralized location, Viewer users can conveniently view the metadata for various data object types in one place. The List View is a consolidated view, displaying the crawling, profiling, and metadata information. 

  • Users can leverage filters for targeted data retrieval. 
  • Hover over a data object to see a quick-view option that allows basic insights through a pop-up, facilitating swift identification of the data object. Clicking "View Details" navigates to the detailed summary page if further examination is needed.

To know more about the Metadata provided in Data Catalog Object Pages, please visit: Exploring Data Object Details

Quick & Easy Data Discovery

Streamline data discovery with OvalEdge's Global Search feature enabled at the top of the main header which is conveniently accessible from any page within the OvalEdge. By leveraging the List View in the Data Catalog page, users can enhance their search experience by using filter, sort, and search options based on data object attribute details.

  • Filter to narrow down the search results based on a group of selected options from a drop-down with pre-defined filter options. Users can utilize the search bar provided within the filter window to further refine and find the desired attributes.  The Data Catalog provides the option to filter objects by selecting desired Connectors, Schemas, Tags, Terms, Governance Roles, and even Certification Status. 
  • Sort to arrange the results in a specific order, such as alphabetical ascending or descending order. Based on the first column sorting results, the contents of the rest of the columns can be further sorted. If a field contains a combination of letters and numbers, the sorting is done based on the alphanumeric characters and not numerical values.
  • Search for specific data objects using keywords or phrases. It helps in precisely pinpointing the data objects within the vast data ecosystem. Also, a Conditional search icon represented by the eight dots icon is enabled next to the search field to further refine the search results by excluding/including keywords in search results.  Search using the Conditional Filter or the 8-dots option

Note: Meta-Write users have the option to Configure search keywords on individual data objects. This facilitates the discovery of data objects in the global search based on added keywords.

Configure Views

The Configure View feature provides all users (Meta-Read or Meta-Write access) the flexibility to tailor the List View Columns according to their preferences, allowing for a personalized display of data object attributes and metadata details. Users can select the default System View format on the Catalog's List Page (displaying a set of predefined attributes), explore public views if available, or create a customized view that aligns with specific requirements. 

To learn more about the Configure View provided in Data Catalog List Pages, please visit: Configuring Views in Data Catalog

Data Objects Summary View - Attribute Details

The summary of the Data object offers a brief yet comprehensive overview of the metadata, containing various dates (Crawled date, Meta-sync date, Modified date, etc.)  and fields relevant to the specific data object. It includes crucial information about the data object enabling users to grasp more insights into the data object for analysis and decision-making.

Users with Meta-Write privileges possess the flexibility to actively manage and refine the data object details directly from the summary page. This capability empowers users to customize the descriptions and assign tags, Business Glossary terms, and custom fields, certify, and endorse data objects, ensuring the data objects are tailored to meet specific requirements and preferences.

Some of the most common fields and features in the Summary Pages are mentioned below: 

Basic Data Object Details

  • Business Description: A simple explanation of how the data asset is used in the business and its importance.
  • Technical Description: A clear overview of how the data asset is formatted and structured from a technical perspective.
  • Source Description: A detailed account of the comments given on the data object at the remote data source is fetched and displayed as a source description. 

Note: This field can’t be changed in OvalEdge and represents what's in the source system. 

  • Terms / Tags: Displays the terms or tags associated with the data object.
  • Top Users: It displays the list of active users who have engaged with the specific data object, including their associated scores. By default, the owner, steward, custodian, and governance roles (if configured) are prominently displayed at the top. Other users who have interacted with the term are listed based on their activity count.

Here's how the scoring works:

  • When a user accesses the data object summary page, the user view score is increased by one.
  • If a query is executed on the data object using the Query Sheet, the user query score is incremented by one.
  • When users add comments to the data object, their comment score is also increased by one.
    • Quality Score & View Dashboard: Displays the quality score and allows users to view the complete score details by navigating users to the dashboard page where the service requests raised and data quality rules run are clearly shown.
  • Open Quality Issues & Report New Issue:  Displays the count of service requests that are not resolved and gives users the flexibility to initiate new service requests for data objects by clicking the "Create Tickets" button.
  • Access Instructions: These are guidelines or procedures provided by the Integration Admin  during the integration of a remote connector. They detail essential access permission details and ensure users comprehend the necessary steps and access permissions required to fulfill their responsibilities in connecting with a specific system or connector.
  • View Crawl History: Directs user to Advanced Tools > Data & Metadata Changes > Metadata Changes, which displays the list of changes that an object has gone through while crawling. It shows newly added/updated/deleted data objects along with column name, column size and column position, along with their addition and deletion status. 
  • View Profile History: Directs users to the Advanced Tools > Data & Metadata Changes > Data Changes, where the profiling history and previously executed profile details are presented. All profiling statistics, including column names and corresponding values such as null count, distinct values, top value, minimum value, and maximum value, are organized in a tabular list view for easy comparison between profiled statistics from different point of time.

Catalog Details 

  • Importance Score: The score shows how vital a Table object is across the database based on the number of relationships and the lineage (downstream objects) associated with the Table.

The Importance score is calculated based on the below formula: 

Importance score = [200 X (no. of rows in the Table/Max row count in schema no. of columns)] + [0.1 X no of Important Columns] + [no of downstream X 7] + [PK or FK X 3].


Popularity Score: Popularity Score is a useful metric for understanding how widely a data object is used within a group of users. It is a score calculated from the number of times the users interacted with the data object by viewing, endorsing, commenting, and adding tags to it. In this table, we have listed actions and their respective impacts on the popularity score. Some actions increase popularity, some decrease it, and some have no effect on it.

Action

Popularity Score

Navigate to the term summary page

+1 

Refreshing the summary page

+1

Add a comment

+1 

Endorsement Rating (5-star rating)

Based on the star rating, the Endorsement Rating increases by 1 for a one-star rating and by 5 for a five-star rating.

Endorsement Rating (Red Flag)

- 10 

Add Tag

+ 4 

Add Term 

+3

  • Environment: Displays the environment from which the connector is established, such as development, production, or QA.
  • Additional Fields: Custom fields allow users to add additional information to data beyond the standard fields. This additional context makes data analysis more meaningful and helps uncover insights that might not be apparent from the raw data alone. 
  • Last Modified Date and Last Modified By: It displays the date and time on which any meta-data changes are made to a data object in the application. Also, displays the user who modified it.
  • Last Anomaly Detection Date: Displays the date and time on which the anomaly was last triggered on the data object.

Crawl & Profile Details

  • Created Date & Created By: It displays the date on which the object was first crawled. Also, displays the user who initiated the crawl. 
  • Last Crawled Date: The Last Crawled Date field displays the date when the object was last crawled from the Data Source.
  • Last Meta Sync Date:  The "Last Metasync Date" for a data object refers to the timestamp of the most recent changes made to the data at the source system in sync with the latest crawl. It is not solely determined by the latest crawl date but considers the actual modification date of the data. If a data object is crawled and there are no changes since the last metasync date, the metasync date remains unchanged. Only when subsequent changes occur and a new crawl is performed does the Last Metasync Date get updated to reflect the date of the latest modifications at the source system in sync with the latest crawl.
  • Last Profiled Date: The Last Profiled Date field displays the date when the object was last profiled.
  • Last Populated Date: It displays the date on which any metadata of the table was last Updated or modified using OvalEdge APIs. 

Data Preview

Users with appropriate Data Access permissions i.e., Data Read/ Data Preview/Data Write permissions can view a sample set of profiled data (configurable in Connector Settings) for a specific data object in the Data Preview Page. This feature allows users consuming the data to gain a practical sense of the data and fosters a deeper understanding of the object and its associated data.

Data Lineage Tracking

Data lineage involves tracking the complete transition path of a data object, revealing its origins and destinations, and ensuring data traceability. This Data Catalog integrated functionality allows organizations to enhance data quality by identifying and addressing the underlying causes of any issues, thereby facilitating more informed decision-making processes. Users also have the flexibility to establish their own data lineage using various methods and techniques.

To know more about Lineage and how to build them, please visit: Lineage Overview


View and Build Entity Relationships

Entity relationships in the Data Catalog are visual representations depicting the associations between tables across various databases within the system, utilizing primary key and foreign key relationships at the column level. For instance, a scenario may involve an Analyst requiring a unified view of customer ID information, achievable through the Data Catalog's Entity Relationships feature, which facilitates the comprehension of customer data scattered across multiple files within disparate systems. Users also possess the capability to establish their own relationships through diverse methods and tools.

To learn more about Entity Relationships and how to build them, please visit: ER - Draft

Refer Objects & View References 

Any Users (Meta-Read/ Meta-Write access) can refer or link to different types of Data Objects and users within text fields like Collaboration Messages, Business Descriptions, and Technical Descriptions using the "@" annotation. Instances, where a specific object is referenced, are displayed in the References Tab of that respective object. This References functionality allows users to trace links, gaining insights into the interconnected relationships among various data objects. By doing so, it significantly improves the discoverability of pertinent information, providing users with an efficient means to find and navigate related content within the Data Catalog.


Endorsing & Certifying Data Objects

Endorsement Rating is given by any user who has access to the data objects, including users with meta-read access or meta-write access. It reflects the user's opinion or feedback about the data object, assessing its quality and usefulness based on their experience with the data object or requirements. It involves a star rating system where users can assign a rating out of five stars. A five-star rating  signifies high-quality and reliable data, while a one-star rating may indicate poor data quality or other concerns. 

Certification: It involves an evaluation process conducted by OE-ADMIN or governance stakeholders in an organization such as administrators, stewards, and owners, who possess expertise in and around the data object. This evaluation aims to determine whether the data object is business-ready for the consumers by assigning certification statuses such as Certify, Violation, Caution, Inactive, or None to the data objects. Guides consumers in making business-critical decisions based on the level of trust associated with the specific data object.

Collaboration

Each summary page of the data object within the Data Catalog includes a collaboration window that enables users to interact and exchange valuable insights, comments, and real-time discussions about the data object with essential stakeholders such as data owners, data stewards, custodians, and prominent users. By providing direct and instantaneous communication channels, the collaboration window reduces reliance on lengthy email correspondences, streamlining the process of obtaining and disseminating information. This cultivates a more effective and productive collaborative setting, fostering informed decision-making based on collective insights and expertise.

Curation & Metadata Management 

The crawled and cataloged metadata might not provide all the necessary information about the data as it typically only consists of the data object name or basic source description. Moreover, data object names can be technical and challenging for data consumers to comprehend. This lack of context can make it difficult for data consumers to understand the data's business value. 

Curation plays a crucial role in enriching the data objects with additional details to enhance their contextual understanding. By adding value and context to the data objects, curation simplifies the process of data discovery, analysis, and management. It is a fundamental step that should be performed before presenting the data objects for final consumption. By curating the metadata, business value is effectively enhanced and communicated, enabling data consumers to make informed decisions and better utilize their data objects.

Note: It is important to note that users with Meta-Write permissions can only edit and update (curate) data objects.

Edit Data Object Title, Business & Technical Descriptions 

The summary page of each data object in the catalog provides critical information, offering a concise overview of its key attributes. Users with Mata-Write permission can edit the name of the object and write a business description and technical description that defines the data object and its purpose. This allows users to easily comprehend the purpose and contents of a particular data set.

Assigning Tags to Cataloged Objects

Tags help group data objects together, enabling improved management and discovery within OvalEdge. Tags can be assigned to multiple data objects, and conversely, each data object can have multiple tags associated with them. The primary objective of using tags and assigning them to data objects is to facilitate searching for all data objects that are associated with a specific tag. For instance, assigning a tag like "customer details" to relevant data objects allows you to identify all the objects associated with customer information.

To learn more about Tags and how to assign them, please visit: Tags - A Deep Dive

Assigning Terms to Cataloged Objects

Business Glossary Terms represent a fundamental component of OvalEdge, crucial for understanding the organization's key performance indicators (KPIs), operational procedures, specialized terminology, abbreviations, and pertinent industry-specific language. By establishing connections between terms and cataloged data objects, users can enrich the context and significance of these data objects, enabling a more comprehensive and insightful understanding of the data landscape.

To learn more about Business Glossary Terms, please visit: ​​ Business Glossary Terms.

Adding Custom Fields for Metadata Enhancement

Custom fields add additional metadata information that can be added to the data objects that are not covered by the standard fields on the summary page. Custom Fields can be of the 4 types: Text, Code, Number, and Date. Text fields store descriptive details, code fields allow users to create and store predefined options, number fields handle numerical data, and date fields capture specific dates and times. These custom fields enhance searchability and usefulness by providing additional context and categorization options for the data.

To know more about Custom Fields and how to use them, please visit: ​​Custom Fields Deep Dive Document

Bulk Updation of Metadata using Advanced Tools

Bulk Actions through Load Metadata from Files (LMDF)

Load metadata From Files refers to the process of ingesting metadata to the application to manage large volumes of data. It allows users to bulk upload new metadata or update existing metadata of all Data Catalog objects in the specified template (.xlsx file). OvalEdge has LMDF templates to cater to all of the data object types available in the Data Catalog. 

OE_ADMIN users can access Load Metadata from Files by navigating to Advanced Tools from the left panel.

To learn more about LMDF and how to use it, please visit: ​​Load Metadata from Files

Bulk Actions through OvalEdge APIs

OvalEdge offers a comprehensive API Library that serves as a fundamental resource for effectively managing different components of the OvalEdge application. This resource provides valuable information by delineating endpoints, authentication procedures, as well as request and response formats. Concerning Data Catalog Objects, users can utilize the POST OvalEdge APIs provided by OvalEdge to efficiently update multiple metadata types in bulk.

To know more about OE APIs and how to use it, please visit: ​​ OvalEdge APIs

Self-Service Tools

The Data Catalog presents an array of convenient self-service tools designed to empower users with streamlined functionalities and simplified processes. These tools facilitate various essential actions, including the seamless integration of service requests through the intuitive Service Desk feature, enabling users to efficiently manage and track the progress of requests. Additionally, users can leverage the Certifications module to certify data objects, ensuring enhanced data reliability and trustworthiness. Moreover, the option to add data objects to the Watchlist provides users with the capability to closely monitor specific data objects, staying informed about any pertinent changes or updates. These self-service tools within the Data Catalog serve as invaluable resources, empowering users to navigate complex data operations with ease and efficiency.

Adding Data Objects to My Watchlist

When users add a data object to their watchlist, they gain access to real-time notifications that keep them informed about any changes or updates associated with that specific object. It ensures that users stay up-to-date with the latest changes or updates happening both within the catalog and across remote data sources. By receiving timely notifications, users can make well-informed decisions, and effectively leverage the data to meet their business needs.

To know more about the Watchlist and how to use it, please visit: ​​My Watchlist Deep Dive

Adding Data Objects to Impact Analysis

Once a data object is added to a predefined Impact Analysis and subsequently executed, OvalEdge conducts an extensive analysis of the data object’s data lineage, considering intricate relationships and dependencies. If run on one data object it displays the other impacted data objects that can be impacted if any changes or alterations are made to the data object added to the Impact Analysis. The results of the impacted data objects are then presented in a clear and concise tabular format, providing a count of the affected objects for easy comprehension.

Adding Data Quality Rules and performing Anomaly Detection 

Within the Data Catalog, users can incorporate Data Quality Rules into a specific data asset, aiding in the evaluation of the object's Quality Score. Additionally, users can conduct Anomaly Detection on the same data object, identifying any sudden irregularities that may be present within the most recent profiled data of the object.

To learn more about Data Quality and its Rules, please visit: ​

Adding Data Objects to Projects

Projects within OvalEdge facilitate the creation of shared workspaces, enabling groups of users to collaborate on various data objects such as Tables, Files, and Reports. Within these projects, users can exchange insights, work collaboratively on data objects, and monitor the progress, ensuring that project managers and relevant stakeholders are kept up-to-date with the project's status. In the Data Catalog, users can add a  data object to a default project or select from the list of projects available from the drop-down.  

To know more about Projects and how to use them, please visit: Projects Deep Dive 

Adding Data Objects to Access Cart 

When the  Access Cart is set as the default project, it serves as a centralized access request management system. Users (Meta-Read/Meta-Write) can add multiple data objects to the cart and raise access requests in bulk. This simplifies the process of managing and tracking access permissions for the selected data objects, improving efficiency in granting appropriate access to users on the data objects.

Raising Service Requests on Data Objects

The Service Desk functions as an intermediary platform, fostering communication and collaboration between application users and business owners, enabling the effective handling of diverse requests to maintain seamless operations. It empowers users to initiate service requests, oversee, and monitor the complete lifecycle of these requests, from their initiation to fulfillment. Within the Data Catalog, business users have the ability to submit requests concerning data objects, either for data access or to modify the metadata content, ensuring a streamlined and efficient process for managing data-related operations.

To know more about Service Desk and how to use them, please visit: ​​ Service Desk.

Viewing Data Objects in Query Sheet

Query sheet is an interactive interface that allows Author Users to build, summarize, save, and run queries as per business requirements. This is a lightweight query tool that allows users without familiar SQL knowledge to perform query operations like aggregation, joins, unions, grouping, sorting, and filtering with a single-click button. From the Data Catalog, a user with minimum Data Read access can view the Cataloged object in the Query Sheet using the 9-dots options.

Nine Dots Action Items

The Data Catalog 9-Dots icon provides users with various options for efficient control and organization of data objects from raising service requests to adding objects to data quality, impact analysis, watchlist, dashboard, and more, actions are tailored based on access permissions.

Note: Database (DB), Tables (T), Table Column (TC), Files (F), File Columns (FC), Reports (R), Reports Columns (RC), Codes (C), and Application Programming Interface (API).

Bulk Execution on Multiple Data Objects

Meta Read

Meta Write

Add/ Remove to My Watchlist (DB, T, TC, R, RC, C, API)

Add/Remove Tags (DB,T, TC, F, FC, R, RC, C, API)

Add /Remove Terms (DB, T, TC, F, FC, R, RC, C, API)

Add/ Remove to Access Cart (DB, T, TC, F, FC, R, RC, C, API)

Add/Remove to Default Project (DB, T, TC, F, FC, R, RC, C, API)

Service Desk (DB, T, TC, F, FC, R, APIs) report columns

Data Quality (DB, T, TC, F, FC, C, APIs)

Update Governance Roles (T, TC, F, FC, R, RC, C, API)

Change Certification Type (T, TC, F, FC, R, API)

Anomaly Detection Settings (DB, T, TC)

Add Tables to Impact Analysis (T, TC, F, FC, R, RC)

Add Reports to Dashboard (R)

Execution on Specific Data Object

Profile (DB, T, TC, F)

Profile Unprofiled (DB, FC)

Profile & Do Anomaly Detection (DB, T, TC)

Process Upstream / Downstream objects (T,TC,F,FC, R RC,C)

Add  to Impact Analysis(T, TC, F, FC, R, RC)

View in Query Sheet (T, TC)

Add to Watchlist (DB,T, TC, F, FC, R, RC, C)

Send Messages 

View in Query Sheet

Update Governance Roles(DB, T, TC,F,FC, R, RC)

Send Messages (T, TC, F, FC,R, RC

Service Desk (DB,T,TC,F,FC

Apply Certification(T,TC,F, FC, R, RC

Configure Search Keywords(DB, T, TC,F,FC,R, RC, C)

Data Quality (T, TC,F,FC, C)

Download (Data, Descriptions) (DB, T, TC,F,FC)

Folder Options

Profile File

Catalog all the subfolders

Catalog all the files

Uncatalog Files

Catalog and Profile all Files

Profile File

Uncatalog File/Folder

Codes

Add New Code

Recommend Lineage & References

Delete Associations

Delete Lineage

Report Data Quality Issue

Delete Code

Other Actions

Endorsement Rating

Add to Projects

Configure Views

The document attached provides in-depth explanations for the action items related to the 9 Dots. Managing Data Catalog Objects with Nine Dots Action Menu.

Exploring Data Objects in Detail

Databases


Displays the list of tables present in the schema along with Queries, Codes, Functions, Views, Procedures, Triggers, Synonyms, and other types of queries of a Schema in respective tabs at the bottom of the page.

Here are additional attributes presented on the schema-level summary page, alongside the fundamental attributes explained earlier.

  • Row Count: The row count at the schema level displays the total number of rows across all tables within that particular schema. 
  • Table Count: Displays the total number of tables in a Schema. 

 9-Dots Action Items specific to data object

  • From the 9 Dots of the Summary Page, Integration Admin users can profile any particular Schema and also profile the unprofiled data of that Schema.

Tables

The table summary page features a comprehensive list of table columns associated with the table, including each column's profiling attributes displayed in a convenient list view at the bottom of the screen.

Here are additional attributes presented on the Table-level summary page, alongside the fundamental attributes explained earlier.

  • Row Count: It represents the total number of rows or records within that specific table. 
  • Column Count: Displays the total number of columns present within that specific table.
  • Type: Specifies whether it falls into the categories of Tables, Views, or Materialized Views.
  • Has Relation: Indicates whether the table has relationships. Yes implies the presence of relationships; No indicates the absence of relationships.
  • Has Lineage:  Indicates whether the table has a lineage. Yes implies the presence of lineage; No indicates the absence of lineage.
  • Density: Indicates the degree to which a table is filled with unique values. It is the proportion of distinct values compared to the total values in the table. A higher density suggests more unique values, while a lower density indicates a more repetitive or uniform set of values.
  • Null Density: Displays the ratio of null values to total values in a table. 

 9-Dots Action Items specific to data object

  • From the 9 Dots of the Summary Page, users can profile any particular table as well as view a table in the Query Sheet.

Table Columns

All columns associated with a table are presented in the left panel menu, allowing users to sort and view columns according to their preferences.

Each Table Column has a dedicated summary page with attribute details highlighting its distinctive characteristics.  Lineage can be viewed at the column level. Pattern relationships offer an additional layer of insight into the interconnections between various data elements in the co-system. 

Here are additional attributes presented on the Table-level summary page, alongside the fundamental attributes explained earlier.

  • Top Values: Showcases the top 50 most occurring records in the rows of that particular column. 
  • Length: Represents the maximum number of characters or bytes it can store. This information is crucial for understanding the storage requirements and constraints associated with individual columns.
  • Key:  Identifies if it is a Primary key or a Foreign Key.
  • Type:  It denotes whether the column denotes an integer, varchar, char, date, time, boolean, decimal, float, text, etc.
  • Nullable: Table Columns are configured to permit or restrict null values at the time of creation at the remote source. If set to True, a column can have null values, signifying the absence of a value. Conversely, if set to False, a column is required to have a complete set of data and cannot contain blank cells.
  • Has Relation: Indicates whether the table has relationships. Yes implies the presence of relationships; No indicates the absence of relationships.
  • Has Lineage:  Indicates whether the table has a lineage. Yes implies the presence of lineage; No indicates the absence of lineage.
  • Classification: Displays the classification applied which is usually inherited from the term associated.
  • Distinct Value: Refers to unique values that are not repeated in a data object or column. It helps to understand the uniqueness of the data object.
  • Minimum Value of a column displays the lowest value from all the records in the dataset.
  • Maximum Value: Displays the highest value of all records in the dataset.
  • Empty Count:  Displays the number of records that are empty in a table column. 
  • Zero Count: Displays the number of records that have values as ‘0’. 

Files

The Data Catalog stores the first-level folders fetched from remote data sources directly, but subsequent sub-folders and files are not initially cataloged; instead, their file structure is stored in the File Manager module.  Users have the option to catalog all sub-folders and files to the Data Catalog from the File Manager.

Here are additional attributes presented on the File level summary page, alongside the fundamental attributes explained earlier.  

  • Type: Denotes whether it is a File or Folder
  • Null Value Count: Displays the count of null values in the same specific column.
  • Maximum Value: Represents the smallest or lowest value observed in a specific column. 
  • Minimum Value: Represents the largest or highest value observed in a specific column. 
  • File Size: Size of a specific file (not a folder) in bytes.

 9-Dots Action Items specific to data object

  • Uncatalog File/Folder: To remove a specific folder from the Data Catalog, uncatalog it. The folder or file can be cataloged only by using the File Manager.
  • Catalog All Subfolders: To  catalog all the subfolders inside a particular folder from the File Manager to Data Catalog > Folders.
  • Catalog all the Files: The user can catalog all the files of that particular folder.
  • Catalog and profile all Files: This option is only relevant for a Folder, where a user can simultaneously profile a specific folder and catalog it to the Data Catalog.
  • Run Folder Analysis: Run folder analysis gives a brief set of analytics on the folder. 

File Columns

The File Columns tab displays the columns associated with the Files that are cataloged.

  • Classification: Displays the classification applied which is usually inherited from the term associated.

Action Items

  • From the 9 Dots, a user can profile the particular File that the File Column belongs to, using the Profile File option and can also uncatalog that File using the Uncatalog File option. Users also have the option to catalog all sub-folders and files to the Data Catalog from the File Manager.

Report

In the Data Catalog, the Reports tab showcases all the reports crawled from connected source systems. Here are additional attributes presented on the Report level summary page, alongside the fundamental attributes explained earlier.  

  • Column Count: Indicates the number of columns present in the report.
  • Path: Specifies the location of the report, representing the folder path where it is stored in the remote system.
  • Type: Displays the report type, such as datasets, workbooks, dashboards, titles, views, Webi, CrystalReports, Page, ColumnChart, map, etc.

 9-Dots Action Items specific to data object

  • Open in Source: Allows users to open a report in its original source.
  • Add to Dashboard: It permits the inclusion of reports in OvalEdge Dashboards for Author Users with Read-Write permissions, facilitating easy access without the need to search through the entire report repository. This feature streamlines access to frequently used reports, enhancing user productivity.

Report Columns

The Report Columns tab displays report columns associated with a Report. The columns that are used in the Data Source to build the Report are shown after crawling the Connector. 

  • Path: It denotes the report's location i.e. the folder path where the report is stored in a remote system.
  • Type: it displays datasets, workbooks, dashboards, titles, view, Webi, CrystalReports, Page, ColumnChart, map etc.

Codes

Codes display queries crawled from connected data sources, queries executed using the Query Sheet,  and queries added through the  "Data Catalog > Codes > Add New Code" option for a selected  Connector/Schema in OvalEdge.

There are additional attributes presented on the Code summary page, alongside the fundamental attributes explained earlier.  

  • Catalog Status:  If cataloged in the Data Catalog, it shows as "Yes". If not, it shows as "No".
  • Last Run Date:  Represents the date when the specified codes were last executed or run.
  • Job Type: Denotes the type of job associated with the code, such as View, Function, Procedure, Trigger, workflow, extraction, transformation, loading, etc.
  • SQL Type: Denotes whether DDL, DML, DCL, and supported formats.
  • Lineage Status: Indicates success, failure, or partial building of lineages.
  • Parent Query: This field helps in distinguishing between datasets that are considered parent datasets, particularly when working with codes that have similar names.

9-Dots Action Items specific to data object

  • Add New Code: It allows users to include a new code or query for a chosen Connector/Schema in OvalEdge. The added query can be added and viewed within the Data Catalog.
  • Recommended Lineage and References: It displays the associations between the query and other data objects and also helps establish the latest connections and associations. It tracks the lineage of codes, documenting their origin, modifications, and relationships to other data objects.
  • Delete Associations: It removes any connections or links that the query has with other data objects. It eliminates the associations between the query and any related data objects.
  • Delete Lineage:  Deleting the lineage of a query refers to removing the historical record of how the query was created, modified, and connected to other data objects. This action effectively erases the lineage information associated with the query.
  • Delete Code: It refers to the act of removing a specific code from the OE instance. When code is deleted, it is permanently removed and no longer exists within the Data Catalog > Codes.

APIs 

APIs can be crawled through the API Connector, with the Swagger and OpenAPIV3 Type support. This framework enables security measures to restrict API access exclusively to authorized users.

API Attributes

API Attributes are the Request and the Response Parameters inside any API. API Attributes can be crawled into the Data Catalog Like Table Columns, users can assign tags, Business Glossary terms, add classifications, and many other existing metadata-related actions.

Adding Manual, Virtual & Temp Objects

Virtual Objects

The Virtual Objects feature allows users to add a new set of data objects as tables, table columns, reports, and report columns to the Data Catalog. Objects created using this feature become essential elements of the system's lineage and relationships. This functionality proves to be beneficial in bridging gaps where users require the inclusion of a data object to meet evolving business requirements. Virtual objects can be converted into actual data objects within the catalog by manually adding them to the remote data source and initiating a recrawl.

To learn more about Virtual Object Creation & Management, please visit: Virtual Objects | Deep Dive Article

Manual Objects

Author Users can add tables and table columns to the data catalog using the Manual Connector supported by OvalEdge. Manual Objects can be added using the "Manage Tables" icon located in the top right corner of the Data Catalog > Tables page by selecting the "Create" option. Additionally, to view and save the newly created objects to the Manual Connector, the Manual Connector connection must be established.

Temp Objects

Lineage building typically requires the associated objects to display Data Movement.

In the absence of original associated objects, temporary associations are formed to build lineage. These temporary data objects do not contain any actual data but help visualize the lineage effectively. These temporary data objects are referred to in the product as “Temporary Tables”. After crawling the original objects, the temp objects are merged with the original ones.

For example, if you've crawled an Oracle data source with 5 schemas, but you've chosen to crawl only 2 of them, the source might have links to data objects in the other 3 schemas that are not crawled. In the lineage diagram, temporary objects are generated for the missing objects. These temporary objects can be reintegrated into the original data objects only when the other schemas are crawled.

Note: In the "Advanced Tools > Temp Lineage Correction", users are provided with the capability to replace temporary data objects with the original ones using  the “Merge” option.

Advanced Tools

OvalEdge offers a set of Advanced Tools that allow Admin & Author  Users to get more insights and analysis on the data objects. These tools aim to optimize the utilization of data objects available within the Data Catalog, providing users with effective and efficient means to handle various data-related operations.

Data & Metadata Changes 

OvalEdge's Data & Metadata Changes feature allows Author Users to monitor and track alterations to both data and metadata on data objects that are made at the remote data source. This ensures users have a comprehensive understanding of the evolving data landscape.

Any changes made in the source system are presented as logs or a list of items in the dedicated sections about both data and metadata, providing users with an efficient means to track, analyze transformations, identify trends, and enforce data governance policies effectively.

Compare Schema

Compare Schema is used to find differences between two objects of a Schema, such as tables and columns, to analyze changes in their structure and attributes over a certain period. It is typically used for tracking structural changes in databases, making it valuable in scenarios like database version control, and identifying modifications in tables, column counts, and their attributes.

Compare Profile

Compare Profile, is used to compare the data within tables or table columns, particularly focusing on the content of the data and its characteristics over different profiling sessions. This feature is beneficial for tracking profiled changes over time, helping users compare tables, and checking for changes in column names, properties, attributes, statistics, and data characteristics like null counts, distinct counts, top values, and minimum and maximum values. It is commonly used in data quality monitoring, analytics, or data transformation processes.

Upload File or Folder

This tool enables users to manually add local files (system) and folders into the Data Catalog. By default, these files and folders are stored in the File Manager module supporting NFS Connector. It's important to note that not all files or folders are automatically added or  saved to the Data Catalog. Only the first level of folders is cataloged initially; the remaining subordinate files and folders are stored in the File Manager. Users can manually choose to catalog these files/folders by selecting them in the File Manager.

Security on Cataloged Objects

Role-based Access Control

Upon entering the application, new users are assigned specific roles accompanied by categorized permissions, including "metadata" permissions (such as Meta-Read and Meta-Write) and "data" permissions (ranging from restricted access to preview, read, and write functionalities). These designated permissions dictate the level of engagement users can have within the application, based on their assigned roles and access privileges.

  • For Meta-Read permissions, users associated with this role and granted access to a particular connector/schema/data object can view and access data objects exclusively within that connector. This implies that they can view the comprehensive details of the data objects, including descriptions, profiling statistics, and other relevant information related to the terms. `
  • On the other hand, for Meta-Write permissions, users linked to this role and possessing access permissions for a specific connector/schema/data object can not only view the data object details but also actively curate, edit, and refine them. This expanded level of privilege allows users to implement changes such as modifying term descriptions, and custom fields, adding tags, incorporating terms, and adjusting lineage and relationships.

Note: It's important to note that users and roles lacking access permissions to the connector will not have visibility into the data objects within the Data Catalog. 

Data Catalog Permissions

Meta Permissions

Meta Read

View the Metadata

Meta Write

View and Edit Metadata 

Data Permissions

Data No Access

Cannot View Data

Data Preview

View Sample Data & Profile Statistics

Data Read

View Sample Data, profile Statistics, Query the Data Source system using Query Sheet (Restricted to only perform SELECT functions), and Download the Data

Data Write

View Sample Data, profile Statistics, Query and edit the Data Source system data using Query Sheet, and Download the Data

Special Privileges 

ADM

Admin roles hold Meta Write and Data Write privileges on the data objects, allowing them to administer security settings and make changes."

Governance Roles on Objects

Apart from the Metadata and Data permissions, OvalEdge also assigns dedicated Governance Roles and Teams to the objects. 

Governance Roles on Objects

Effective governance is crucial for organizations to ensure the quality, integrity, and compliance of cataloged data objects. OvalEdge facilitates a Governance Framework with key roles - Steward, Custodian, and Owner. One crucial aspect of this lies in the framework's ability to address and resolve issues raised by business users concerning data objects. 

  • Data Steward: Users who are responsible for most of the data governance functions such as metadata curation, reporting, and controlling data quality issues.
  • Data Owners: Users who own data of a business unit and often play a major role in defining the access privileges.
  • Data Custodians: Techno-functional resources with business and technical knowledge. Custodians can be allowed to build lineage, create data quality rules, etc. with Meta-Write permissions.

Note: Organizations can customize their Governance roles as well as add more Governance Roles (4, 5, 6). 

Impacted Areas

The Data Catalog module of OvalEdge boasts the maximum number of impacted areas around the application, as the Data Objects are integrated for various functionalities in all places. 

  • Tags: The Tag Summary View displays the associated Data Catalog objects of that particular Tag.
  • Business Glossary: The Term Summary View of Business Glossary displays the associated cataloged objects of that particular Term. Users can also add objects to that Term using the ‘Add Objects’ widget which showcases all the cataloged objects by their types. 
  • Data Stories: In Data Stories, a user can refer to a Cataloged object while writing a story in the Story Zone. 
  • Dashboards: Within the Dashboards, there is a Data Literacy dashboard that presents the number of Data Objects that have been Crawled and Documented in the Data Catalog. 
  • Projects: In the List View of any particular project, we can see all the associated cataloged objects in separate tabs according to their types. Additionally, we can add cataloged objects to any project using the ‘Add Objects’ widget.
  • Service Desk: Users can raise service requests on Data Catalog Objects on Data Access Request, Metadata Change Request, and so on. 
  • Governance Catalog: The various modules of the Governance Catalog also impact areas of the Data Catalog.
  • Data Classification: All classified/unclassified  data objects of the Data Catalog (Tables, Table Columns, Files, File Columns, Reports, Report Columns) can be filtered based on object type. View data objects based on classification (confidential / sensitive /PII) or view unclassified data objects to apply a term or classification.
  • Certification Policy: Associated data objects of any Certification Policy can be viewed in the Associated Data tab of the Certification Policy. 
  • Reference Data Management: Table Columns are referred as Attributes in Reference Data Management and all the similar attributes are grouped at one place (Reference Data Unit). 
  • Data Quality: Objects associated with Data Quality Rules are showcased in the Associated Data tab of Data Quality Rules.
  • My Resources:
    • Inbox: Users will receive notifications related to data catalog objects 
    • My Profile: Notification preferences can be set up for data catalog
    • My Permissions: All the authorized objects are displayed with respect to the user on the basis of governance role, access in OvalEdge, and access in remote 
    • My Watchlist: Users can add data catalog objects to their watchlist and configure the notifications 
  • File Manager: All crawled Files and Folders will be showcased in File Manager. Users will be able to catalog these files and folders from File Manager. Users will also be able to view analytics on these Folders using the Folder Analysis.
  • Query Sheet: Users have the capability to access, read, and modify data of cataloged objects to monitor alterations within the source system based on access permissions.
  • Chrome Extension: Users can search for cataloged objects in Chrome Extension as well as raise Data Quality issues using the feature. 
  • Administration
    • Security: Admin users can change permissions on Data Catalog objects for Author and Viewer users.
    • Custom Fields: To create different text, code, date, and number custom fields to display in the summary page of the data catalog.
    • Audit Trails: It showcases logs documenting changes in metadata details across various types of data objects. These logs include information about the user who made the changes and the timestamp of the modifications.

System Settings Configurations

These settings allow users to customize parameters that control the Data Catalog application's behavior. Users can enable, disable, or modify default values. These configurations can affect various aspects of the application's behavior, including how it integrates with other systems, changes its appearance, and performs certain tasks. It is important to carefully consider and set these configurations to ensure that the application functions properly and meets the requirements of its intended use.

Key

Value

Description.

pagination.navigation.lastpage

false or true

Enable/disable the last page navigation in the application, as it takes a lot of time to load all the pages.

The default value is False.

If set to True, the last page navigation is not displayed in the application

If set to False, the last page navigation is displayed

display.max.topusers

Provide a any number

Used to display number of topusers as defined here

catalog.views.setdefault.allow

false or true

It controls whether or not users can select a default view.

1. default value: True

2. If set to True, the users can select the Visible to Allcheckbox in the Configure View pop-up window.

3. If set to False, the Visible to All checkbox will not be displayed in the Configure view a pop-up window

catalog.views.create.public.allow

false or true

To control whether or not non-admin users create a view visible to other users.

1. default Value: False

2. If set to True, non-admin users can create a public view by selecting the Visible to All checkbox in the Configure View pop-up window.

3. If set to False, the Visible to All checkbox will not be displayed in the Configure View pop-up window. This could be useful in cases where users should not be able to create views that are visible to all users, or when it is not necessary for users to share views with other users.

ovaledge.governancerolesupdate.isempty

false or true

Show/hide the radio button Cascade when Empty in the Update Governance Roles Pop-up window.

1. The default value is False.

2. If set to True, the radio button Cascade when Emptyis enabled.

3. If set to False, the radio button Cascade when Emptyis disabled.

download.row.limit

50000

Define the maximum number of rows to be downloaded on a page. (Value should be equal or above 500)

1) The default value is 5000.

2) Enter the value in the field provided.

pagination.row.limit

50

Specify the maximum number of records to be displayed in a page.

Parameters:

The default value is 50.

Enter the value in the field provided.

is.erd.advancedsettings.enabled

true or false

To enable/disable Advanced options in Entity Relationships

1) The default value is False.

2) If set to True, the Advanced options are shown

3) If set to False, the Advanced option is disabled.

config.file.types.to.be.cataloged

csv,conf,env,sh,properties,txt,yaml,xlsx,json,ddl,sql,hql

Specify the file type format to upload and catalog files from external sources.

The default value is csv,conf,env,sh,properties,txt,yaml,xlsx,json,ddl,sql,hql,parquet.

Enter the file type formats in the field provided

file.skipfullprofile

false or true

Configure to skip profiling on parquet File types for HDFS.

The default value is False.

if set to True, the profiling is skipped.

If set to False, the profiling is performed.


exclude.datacatalog.subreport

false or true

Enable/disable the Sub Reports to display in the UI. (Data Catalog>Reports ).

The default value is False.

If set to True, the Sub Reports are not displayed in the Reports.

If set to False, the Sub Reports are displayed under Reports (if any exist).

exclude.report.type

Empty or Report type

Configure the Report type to be excluded from the Data Catalog>Reports tab.

The default value is Empty.

Enter the report types separated by commas in the field provided.