File Manager

File Manager

The File Manager module enables users to manage files and folders. OvalEdge connects to file systems and quickly organizes all the folders and files stored in the user's local Network File System(NFS) Server, Hadoop Distributed File System(HDFS), and also Cloud Storages like AWS S3 or Google Cloud Storage. 

Users can catalog each folder and file in the file manager. The cataloged files are profiled in the Data Catalog module to collect the file statistics. A cataloged file provides additional information about the location of a  folder and the file type stored. 

OvalEdge supports different Data Lake Systems (Hadoop, Amazon S3, Google cloud storage) that store vast amounts of raw data in its native format.

Connect your File System

To connect your file system; you need to add a File connection in the OvalEdge application and view the files on the File Manager module.

  1. Go to Administration > Crawler.
  2. Click on the + icon (New connection) and enter the file system database name (NFS/S3/HDFS/ M3 Azure/Google Drive) in the Manage connection pop-up window.
  3. Enter the connection credentials for the selected connection.
  4. Click on Validate button and then click on Save to store the connection details.
  5. After validating the entered connection details, you can also directly click on the Save and Configure button to establish the connection and configure the connection settings.
    Note: The Save & Configure button is displayed only for the Connectors for which the settings configuration is required.
  6. When you click on the Save & Configure button, it will display the Connection Settings pop-up to configure the connection settings for the selected Connector. Now you can see the recently added connection on the Data lake page.
  7. After saving the connection, go to Administration > Crawler, and click on the Crawl/Profile button to crawl the file connection. It will display the Files in the File Manager module once the crawling/profile was successful. 

Note: All the files and folders available in the file connection will be displayed in the File Manager. However, only the first level folder and files will be displayed on the Data Catalog > Files. The second-level files/ folder available in the file connection will only be displayed on the Files Manager, and to view those in the data catalog, it is required to catalog the files/folders.
Example: The S3 file connection has a folder Adithya123, and inside the folder Adithya123, duckdb_test.jar, s3rdamclient.war file, and the Aditya folder is available. When you crawl the S3 connection, all the files and folders will be displayed on the File Manager. However, in the data catalog > Files, only you could see the Adithya123 Folder.  After cataloging the duckdb_test.jar, s3rdamclient.war file, and the Aditya folder, you could view these in the Data Catalog > Files.

Note: Manually, a file can be added to the application via File Manager. Only after cataloging the files, it can be viewed in the data catalog > files.

Note: Manually, a file can be added to the application via File Manager. Only after cataloging the files, it can be viewed in the data catalog > files.

Select your Data lake

When you click on the File Manager module, it will navigate you to the Select Your Datalake page. 

You can view the available File connections and the connection type on the Select Your Datalake page, search the File connection name, and filter the connection type using the search and filter icon configured on the respective column.

To access the Data Lake:

  1. Navigate to the File Manager module.
  2. Select the Connection Name from the existing connection list.
    1.0
  3. All available Files/folders within the selected connection are displayed on the page.
    2-Oct-21-2022-03-20-51-58-PM

Connection Name Details

In OvalEdge, the File Manager collects and displays the below information for each connection.

  • Type: An icon to identify a file or a folder.
  • Name of file/folders: The name of the file/ folder from the connection.
  • File Type: It displays the type of file (.csv, .xslx, etc).
  • Catalog sign: OvalEdge needs users to catalog each file or folder to organize the file/folder metadata. A file must be cataloged before profiling it (similar to crawling a database connection before profiling).
    • Cataloging a File:
      • The first level of data will be automatically cataloged while creating a connection.
      • The Second level of data has to be cataloged manually by clicking the sign for each file or folder in the file manager module. Alternatively, users can catalog multiple folders or files in the data catalog by using the Nine dots options.
    • Note:
      • The + sign changes to to indicate that the file or folder is cataloged successfully.
      • When a file connection is crawled, the files/folders are automatically cataloged and displayed under the Data Catalog > Files module.
      • Users, when they add a file/folder manually, would need to load the file from load metadata and should manually catalog the uploaded file with the sign indications as shown above.
  • Size of file/folders: It is the physical size of the file/folder in the system.
  • Last Modified Date: The date on which changes are made to the file/folder on the source system will be displayed here.
  • Preview Link: Preview link enables users to navigate the file from Data Catalog, by pasting the URL in a new tab.

Note: On the top of the selected connection page, the count of files or folders available in the file connection is displayed. You can manage the number of rows to be displayed on one page of the selected connection using the configuration filemanager.pagination.row.limit.

Example: If you enter 19 in the value column of the mentioned configuration, then the count of files/folders that can be displayed on one page is 19.

File Manager User Action 

You can perform additional tasks on the files/folders. Click on the Nine-Dots icon in the top right corner to view the available options.

3-Oct-21-2022-03-23-34-66-PM

  1. View File: You can preview the data contents of a file (.csv or .xlsx) in a raw or tabular form by selecting this option. 
  2. RAW VIEW: You can see the data in an unstructured format. 
    Raw View
  3. TABLE VIEW: You can see the available data in a structured format.
    Table view
  4. Download File: You can download the file uploaded in the File Manager.
  5. Delete File: This option enables you to delete the uploaded files from OvalEdge. The files deleted cannot be retrieved/rolled back.  
  6. Upload File: You can upload a file into the File Manager by selecting the upload file/folder button. 
  7. Catalog Files/Folders: The catalog helps the files/folders to be organized in the data catalog and allows users easy access.

Upload Files/Folders For NFS

It is recommended to upload the data from a source to OvalEdge to process it. Follow the steps below to upload a File or Folder from the Desktop to the OvalEdge data lake.   

To upload a file/ folder, 

  1. Go to the File Manager module and select the Data Lake(NFS connection).
  2. Click on the Nine Dots icon and select the Upload File option. The Upload File or Folder Page is displayed. 
  3. On the Upload File or Folder page, by default, it is selected to upload the file. You can click on the toggle button to upload a Folder.
  4. Click on the Select from your computer option and select the directory, where the file resides and select the File you want to upload.
  5. Note: You can create a new directory via Select your Directory > Nine Dots > Create Directory.
  6. Once the file is successfully uploaded, it will appear on the screen with a green color. Click on the Finish button

Catalog Files/ Folders

When a File connection is crawled in the Administration > Crawler page, the files/folder are cataloged automatically.

You can Catalog a file and folder which is added manually through the Load metadata process.

To Catalog a File > Folder:

  1. Navigate to the File Manager module.
  2. Click the + icon on the catalog section,
  3. A pop-up window to perform the catalog action is displayed. Click on Confirm.
  4. The Plus sign changes to Tick sign to inform that the file is cataloged successfully.
  5. Validate the cataloged file in the Data catalog > Files module.

Data Formats supported by the OvalEdge

The following are the data formats supported by the OvalEdge: 

File Extension

Name of the format

Description of the format

CSV

Values separated by comma

 A CSV file stores tabular data in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The extensions are .csv and txt.

XLS

Microsoft Office Excel

contains rows and columns of cells; each can include data, which can be words, numbers, or formulas that have data and solve equations dynamically. XLS spreadsheets can also contain tables and charts that show all selected sections or data.

JSON

JavaScript Object Notation

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server.

Parquet

Apache Parquet

Is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is like the other columnar-storage file formats available in Hadoop: RCFile and ORC. 

TSV

Tab-separated values

A tab-separated values file is a simple text format for storing data in a tabular structure, e.g., database table or spreadsheet data, and a way of exchanging information between databases. Each record in the table is one line of the text file.


Copyright © 2019, OvalEdge LLC, Peachtree Corners GA USA