Deep Dive Articles

File Manager - Data Lake OvalSight

Data Lake OvalSight simplifies the management of complex data lakes through a single interface that enables users to analyze data composition and characteristics at the folder and cumulative levels. It helps prioritize data management tasks and optimize folder structure.

Accessing Data Lake & Folder OvalSight

Author Users:

  • Data Lake OvalSight: Access the main analysis through the Data Lake OvalSight sub-module within the File Manager. This sub-module displays only supported Data Lakes such as Amazon S3, Azure Data Lake, CIFS, and NFS.
  • File Explorer: Access Data Lake analysis directly from the File Explorer's landing page. Click the OvalSight icon in the Data Lake OvalSight column to be redirected to the sub-module.
    • Folder OvalSight: Analyze a specific folder within a supported Data Lake by clicking the OvalSight icon in the Folder OvalSight column for that folder.

Viewer Users:

  • Folder OvalSight: Access the analysis of a cataloged folder in the Data Catalog. OvalEdge administrators can enable the Folder OvalSight tab for folders by setting the "enable.folder.ovalsight" key value to 'True' in the System Settings (Others tab). Once enabled, the Folder OvalSight tab will appear next to the Lineage tab for each folder.

Running Folder OvalSight

Author users with MetaWrite access can initiate Folder OvalSight on a selected folder in different ways:

  • File Explorer: From the File Explorer's 9-dot menu, choose "Run Folder OvalSight."
  • Data Catalog: Within the Data Catalog's File Summary 9-dot menu, select "Run Folder OvalSight."

Note: Running Folder OvalSight on individual folders contributes to the overall Data Lake OvalSight analysis. No separate job is needed for Data Lake OvalSight itself.

Selecting a Data Lake OvalSight

The "Data Lake OvalSight" displays all available file connections, including:

  • Connector Name
  • Connector Type (NFS, S3, etc.)
  • Created By (username)
  • Last Modified On
  • Last OvalSight Scan

Author Users can search for specific connections by name or filter by type using the icons in the respective columns.

Exploring Folder OvalSight

Clicking a connection name in the Data Lake OvalSight will display the contents of the selected connection. It provides detailed statistics for insights into the selected connector and its folders, subfolders, and files. Users can find the following on the dedicated dashboard:

OvalSight Summary

The Folder OvalSight Summary offers users comprehensive insights into the structure and content of specific folders. This Summary provides an in-depth analysis of nested levels within a folder, detailing all subfolders and files up to the last level.

Folder Tiles

The Folder OvalSight Summary gives users a quick overview of a folder's structure and data content. It displays key details like:

  • Folder level within the Data Lake
  • Depth of nested subfolders
  • Total number of subfolders (including empty ones)
  • Total number of files

This summary helps users understand the folder's complexity and data volume at a glance. This information enables users to make informed decisions about data organization and access permissions.

Top 10 File Formats

The Summary uses interactive charts to make analyzing file formats and sizes within a folder easy.

  • Donut Charts: These colorful charts show the most common file formats (top 10) and their prevalence. Hovering over a section reveals details like file names, the number of files, and their percentage of the total.
  • Total Formats: A central number displays the overall number of unique file formats, highlighting format diversity.
  • File Types Exploration: Clicking on a file type count segment in the chart shows a list of all files within that range.

View All: Users can explore all file formats, not just the top 10, by clicking the "View All" button. This comprehensive list shows each format, file count, and percentage. Clicking on a format here displays all associated files.

File Size Ranges

The Summary also helps users understand how storage space is used within the folder. A "File Size Range Analysis" chart breaks down files into five size categories:

  • Less than 100KB
  • 100KB - 1MB
  • 1MB - 10MB
  • 10MB - 100MB
  • Over 100MB

Hovering over a section reveals how many files fall into each size range.

Users can click on a specific size segment to see a detailed list of all files within that range. This helps identify large or potentially redundant files that might be good candidates for archiving, compression, or deletion.

Folder & File Modification Summary

The Folder OvalSight Summary shows how data usage changes over time. It uses bar graphs to track folder and file modifications by quarter within a chosen year. This helps users identify trends and patterns in data activity, such as:

  • A spike in folder changes during a quarter might indicate a project phase or data migration.
  • Increased file modifications could signal periods of busy workflow or data intake.

By understanding these trends, users can plan for future needs and take proactive steps to manage data effectively. Hovering over a bar reveals the number of folders or files modified in that quarter for more details. Clicking on a bar dives deeper, showing all folders or files modified during that period. This allows users to investigate specific changes or take action if needed.

List View of Folder OvalSight

The List View provides a detailed look at a folder's immediate contents. Unlike the Summary, which offers a high-level overview, the List View analyzes the files within the selected folder and its direct subfolders. 

The List View displays metadata for all folders within Data Lake itself with the following details:

    • Folder Name: Names of all folders and subfolders in a Data Lake or a File System.
    • OvalSight Summary: On clicking the OvalSight icon, displays the OvalSight Summary of that particular folder.
  • File OvalSight: On clicking the OvalSIght icon, displays details about the immediate files of that particular folder.
  • Folder Level: Displays the folder level of any particular folder featured in the Folder OvalSight tab. The main or first level of a folder is assigned a Folder Level of '1', the second level as ‘2’, and so on. 
  • Folder Type: Indicates the category the folder falls into, with five defined categories:
    • RFWF - Regular Folder with Files
    • RFNF - Regular Folder with No Files
    • SFWF - Structured Folder with Files
    • SSFWF - Semi-Structured Folder with Files
    • UFWF - Unstructured Folder with Files
  • Catalog: Shows the option for Author Users to catalog a folder.
  • Subfolder Count: Displays the number of subfolders that are inside any particular folder.
  • Number of Files: Displays the number of files in that particular folder, excluding files in its subfolders.
  • Folder Size: Displays the size of the folder in kilobytes (KB).
  • File Sizes: It displays the number of files below 100 KB, between 100 KB to 200 KB, and above 200 KB.
  • File Types: This column displays the various types of files found in the folder, such as .ddl, .yaml, .parquet, and .csv. Clicking the eye icon that appears on hover brings up a pop-up list of all the file types.
  • Sample Files: This column showcases the last 50 modified sample files from the folder. Clicking the eye icon that appears on hover reveals a pop-up list of these 50 sample files.
  • Last Modified Date: This column indicates the date on which a folder was last modified on the source, based on the modification date of any file within it.

Quick & Easy Discovery of Folder OvalSight

The Folder OvalSight List View simplifies data exploration with a user-friendly table format. Users can leverage filters, search options, and sorting capabilities to find the data they need quickly.

  • Filters: Each column offers a drop-down menu with predefined filters. Users can further narrow down results by searching within the filter options.
  • Search: Users can use the search bar to locate specific data objects and leverage the conditional search function (represented by the eight dots icon) to include or exclude keywords for more refined searches.
  • Sort: Organize data by clicking column headers. Users can sort by multiple columns to create a customized view.

File OvalSight

The File OvalSight column gives users a quick look at the analysis of files within a folder. Clicking the OvalSight icon in this column opens a pop-up window displaying detailed information about those files.

    • File Name: Displays all the file names existing in that particular folder.
    • File Type: Shows the type of file, such as .json, .csv, .zip, etc.
  • Catalog: Shows the option for Author Users to catalog a file.
  • Size: Displays the size of the file in kilobytes (KB).
  • Last Modified Date: This shows the latest date when the file was modified at the source system. The change of this date will be reflected when the admin will re-catalog the Data Lake.

Tree View of Folder OvalSight

The Folder OvalSight Tree View offers a dynamic way for users to explore their data structure. It allows users to navigate through folders and sub-folders, displaying detailed Folder OvalSight information for each level, just like the List View.

  • Users can access this view by clicking the "Tree" tab within the Folder OvalSight.
  • Users can quickly drill up through the folder hierarchy.

The Tree View provides search, sort, and filter options similar to the List View. This allows users to efficiently manage and analyze their metadata, even within complex folder structures.

Data Lake Search

Data Lake Search allows users to search for files and folders within their entire Data Lake connection.

Users can search by file or folder names, and a pop-up window will appear, providing detailed information about the search results, such as:

  • Folder/File Name: Identifies the searched Folder or File.
  • Folder/File Level: Indicates the hierarchical position of the Folder or File
  • Type: Distinguishes between folders and files.
  • Catalog: Shows the option for Author Users to catalog a file.
  • Size: Displays the folder or file size in kilobytes (KB).
  • Last Modified Date: This shows the latest date when the folder or file was modified at the source system. The change in this date will be reflected when the admin re-catalogs the Data Lake.

Copyright © 2024, OvalEdge LLC, Peachtree Corners GA USA