Data Catalog

Files

The data Catalog files list all the cataloged folders and files from every connected file system. Any root folder will have multiple subfolders and files in it. You can select a cataloged folder or a file to build the metadata around it. You can associate files with tags, and terms, you can author business descriptions, add source and target lineage, and add a reference to multiple data objects. This search option gives the list of each file and its unique File location. You can select a file or a folder to view its contents. OvalEdge supports file systems such as comma-separated value(.csv), Microsoft Excel (.xlsx), JSON, JSON (deeply nested), and pipe-delimited value, AVRO and Parquet files. 

Filemainpg

File

Description

Type

The Data Catalog Files consist of Folders and Files. Folders may have one or more folders or files in them. Depending on the file type (FILE or FOLDER), it will filter the results accordingly.

File System

Displays the connection name of the File.

File Name

Displays the name of the File.

File Location

Displays the location of the File.

Access Cart

Add to access cart icon will add the file object to the access cart.  

Tags

Display the Tags assigned to the database object. The Tags field is editable; hover over a specific tag field to see an edit icon. Click on the edit icon to edit and assign tags to the object.

Term

Displays the Terms assigned to the database object. The Terms field is editable; hover over a specific tag field to see an edit icon. Click on the edit icon to edit, assign, or remove tags to the object.

Remote Tags

Displays the Remote Tags added to the File.

Business Description 

Displays the business description added to the database object. It is editable; click on the edit icon to edit the description. 

Technical Description 

Displays the Technical Description added to the database object. It is editable; click on the edit icon to edit the description.

Metadata

Displays the Metadata information added.

Created Date 

Displays the date on which the File is created.

Last Modified Date

Displays the last modified Date & Time of the File.

Popularity

It displays the count of the number of times users have interacted with this data object by viewing, endorsing, commenting, adding tags, or querying it. 

Steward

Displays the name of the steward.

Custodian

Displays the name of the custodian. 

Owner

Displays the name of the owner. 

Certification

You can certify an object with Certify/Caution/Violation/Inactive/ None options. 

Folders

When a folder is selected, the OvalEdge data catalog provides the folder statistics, and when a file is selected the OvalEdge data catalog provides the file statistics. When you select a folder, the detail page has a similar layout to the database objects in the data catalog module. The File page is further divided into Summary, Sub Files, Data, Cataloged Files, Lineage, and References. 

Based on the type of file selection, the following tabs are displayed:

  • Based on the folder, the cataloged files tab is displayed. 
  • Based on the data file (.csv) data tab is displayed. 
  • Based on the type of data files, the subfile details are displayed for the files such as XLS and XLSX.  

Folder Summary 

The Folder summary provides all the meta-data information and statistical details of a specific folder  The metadata includes descriptive information about the data, such as Folder Title, Business Description, Technical description, Tags and Terms added Permissions, when and how the data object is created, and the creator. The statistical data fetches and displays the parameters related to the File or Folder, including Row count, Column count, Service request count, Access, Quality Index, Popularity, Profiled, Profile Date, Importance, Size, and other modification details such as Last Catalog date, Last populated date, Last modified date, Certified date and other information such as Profile Status, Type, and Dashboard.  

Catalog Files

Lists all the files and folders cataloged with the selected folder. The Cataloged Files tab provides the list of folders from the connection. It will catalog at the Folder level and sub-folder level. 

 

Cataloged Files

Description

Position

It shows the position of the file available in the folder

Type

File type describes the type of a File or Folder

File Name

Name of a file

Description

Short description of a file 

Extension

Extension of a file such as CSV 

Status

Sample

Row count 

The total number rows count is displayed

Dashboard 

DQR score with dimension

Lineage 

The source and the target lineage of the folder selected. Data Lineage is a visual representation that helps view the origin of the data, describes the path, and shows how it reaches the target and all the transformations it undergoes in its lifecycle. The lineage of a file showcases the upstream and Downstream objects associated with a single File object. It provides the ability to track, understand, record, and visualize the data transformation along with its path from source to destination. 

References

File References lists all references made to the  File from other data objects (Database Tables or File itself) in the application. The benefit of having solid reference data is that you can confidently drill into subsets of your data to gain business insights. 

Filereference

Files

Data Catalog Files also have the option to look for files. You can select the Type filter to search a file type and click on a particular file. 

Similarly, when you select a file, you can observe the details of a selected file type. The file detailed page consists of the following tabs:

File Tabs

Descriptions 

File Summary

File descriptions

File Statistics 

Tags and Terms associated with the folder

Data 

It displays the Data of a file.

Sub Files

Based on the type of data files, the subfile details are displayed for the files such as XLS and XLSX.  

Lineage

The source and the target lineage of the folder selected

References

List of all referenced objects to this folder

Column Details

Statistics Column Lineage and Reference 

File Summary 

Files Summary 

Description

Business Description

A business description provides a clear understanding of the data objects (Tables/Files/Reports) and their function. It is descriptive information about the data object and its fields that will be helpful for business users. By default, the description box is empty, and the user can update it accordingly.  Click on the edit icon to edit the Business description. Note: Users with access to Meta Write (Read-Write) can only edit the Business Descriptions.

Technical description

Displays the technical parameters or comments defined at the data source. It is editable; click on the edit icon to modify the existing technical description.

Terms 

Displays the term associated with the files. The terms field is editable. Click on the edit icon to edit, assign, or remove the Term to/from the table.

Tags

Displays the tags associated with the files. The tags field is editable; click on the edit icon to edit and assign tags to the table.

Last Populated Date

It displays the Date and Time on which the last modifications are made to the data object.

Service request count

It displays the count of service requests made on the data catalog files to Request Content change / Request Access / Report Data Quality.

Access 

The Instruction button is used to view the instructions added to the Access Instructions field through Administration > Crawler > Settings > Access Instructions. Crawler administration will benefit from the valuable information it provides. 

Quality Index

The Data Quality index gives an overall idea about the quality of the Table object based on the Service Tickets raised and resolved. If there are multiple issues that are not resolved, this signifies that data quality is poor. 

Popularity

The Popularity Score displays the number of times the users interacted with this data object by viewing, endorsing, commenting, adding tags, or by querying the data object. The total number of view counts is displayed to show how popular the data asset is relative to other assets in the application.

Profiled

The profiled status field in the object summary page displays as “No” or “Yes”. 

Profiled Date

It displays the latest date and time that a specific data object is profiled to compute statistics of new data. If this attribute is empty, you can identify that the object has not been profiled.

Importance

The Importance score shows how vital a File object is across the database based on the lineage (downstream objects) associated with the File. 

Catalog Date

The file cataloged date represents the date when the file information gets saved in the OvalEdge application.

Profile Status

The Status field displays the profiling state of the Table data object.

  • Profiled - When full profiling is complete on the data object.
  • NULL/Not Profiled - When the data object is not profiled.
  • Sample Profiled - When sample profiling is conducted and completed on the data object.
  • Partially Profiled - When the data object is Partially profiled.
  • Profile Failed - When the profiling fails.

Type

Type displays the specific file type whether it is folder or file.

Profiling the options for folders and files

  • You can use the “Profile All Files” option to profile files inside a folder.
  • You can use the “profiling a folder assuming same content” option to profile a folder, which will only profile to a specific level of folder. By specifying the 3rd level, it means that it will only profile the folder of the 3rd level, thereby skipping the 1st and 2nd levels. 

catalogedfolder

    Note:  A job is submitted once the File/Folder is profiled. You can see the statistics values at the bottom of the summary page of a file or folder.

    File Data

    The Data tab is displayed based on the File type selected.  When you crawl, it displays a file's top 100 records (rows) in the grid.

    Lineage

    The lineage graph displays the data movement and data flow from the source to the destination of file objects.

    Reference

    The File References list refers to all data objects (such as database tables and the File itself) that reference the File. 

    Filereference-2

    Column Details 

    By default, the Column Details tab is inactive. When you profile a file or folder, the Column Details tab becomes active, providing the information for the respective Column Summary, Column Lineage, and Column References tabs for that profiled file folder.

    For more information, see the Data Catalog File Columns document. 

    User Actions

    The following are the User Actions that can be performed on the Files, Folders, and File Columns using the Nine dots option. For more information, please refer to the link User Action.

    Nine Dots options

    Description

    Add Tag

    It adds the Tags to the selected File(s).

    Remove Tag

    It removes the Tags applied to the selected File(s).

    Add Term

    It adds the Terms to the selected File(s).

    Remove Term

    It removes the Terms applied to the selected File(s).

    Add to Default Project / Add to My Access Cart

    It adds the selected File(s) to the default Project set in the Project module. The Add to My Access Cart option is displayed if the Default Project is set to My Access Cart Project.

    Service desk

    It serves the purpose of reporting any data quality issue or content change request on files and columns of files. 

    Remove from Default Project

    It removes the selected File(s). from the default Project.

    Update Governance Roles

    It helps you to change or update the governance roles (Owner / Custodian / Steward and other roles) for the selected File(s).

    Change Certification Type

    It helps to change the certification type for the selected File(s). The data certification is a stamp of approval to ensure the data is consistent, timely, and correct. It lets you filter reports based on their certification status.

    • Certify: A data object marked as Certify complies with the policies assigned for certifications.
    • Caution: A data object containing conflicting information is marked as Caution. 
    • Violation: A Data object that violates associated data quality rules is marked as a Violation.
    • Inactive: If a data object is not used for an extended time, it is marked as Inactive.
    • None: A data object is marked as None if any of the above certifications need to be removed. 

    Add Files to Impact Analysis

    It helps to see the impact of changes made to the File(s) on other data objects upstream or downstream. The Impacted objects can be viewed in Advanced Tools > Impact Analysis. 

    Quick Tips

    It provides a few insights about the File(s).


    Copyright © 2019, OvalEdge LLC, Peachtree Corners GA USA