Profiling

File Profiling

Profiling a file allows you to collect the file column statistics.

To profile a file, it should be cataloged.

When you have different levels of cataloged files in your root folder and subfolders, you can profile all the files in a single click using the profiling all files button. 

Once the files are profiled, the file columns and their statistics are listed on the summary page. The data can then be used to run the Artificial Intelligence Algorithm for File Column tagging, and Lineage discovery of files and folders in OvalEdge. 

To profile files in a cataloged folder and subfolder, 

  1. Navigate to the Data catalog --> Files tab. 
  2. Select a root folder and click the nine dots and then select "profile all files". 
    A pop-up page displays to enter the operational level up to which the files should be profiled. 
    FP1
  3. Select a level. Example: level 2. 
    The root folder is at level 0. 
    FP2
  4. Click Ok to start the process to profile all the files up to the level entered. 
  5. Navigate to the profiled files and click the data tab to see the file contents. 

FP3

Before profiling a file/files that are in a subfolder, catalog the subfolder and the respective file/ files. Similar to the database profiler settings, certain parameters are needed to be configured to execute file profiling.

Ask your OvalEdge administrator or refer to the article Configuration to validate the settings to profile a file. 

OvalEdge supports profiling for the following file types:

  • XLS
  • XLSX
  • CSV
  • JSON (deeply nested)
  • Parquet
  • AVRO
  • ORC