Crawling

Introduction to OvalEdge Crawling

Crawling is an activity by which OvalEdge collects metadata from various sources. These sources can be databases, data lakes, visualization systems, or reporting systems. In OvalEdge, the crawler records the connection between OvalEdge and third-party databases so that users, with the right permissions, can view the metadata and data . 

We have a module in OvalEdge called crawler that connects to a data source, collects, and catalogs all the data elements in the form of metadata that is stored in the OvalEdge data repository.

                       An index for every stored data element is created which can later be used in data exploration within the OvalEdge Data catalog. OvalEdge crawlers can be scheduled to scan the databases regularly, so they always have an up-to-date index of the data element.

The following is achieved by crawling:

  • Connections to source systems
  • Retrieval of metadata from source systems
  • An ability to understand any changes in the metadata after the first crawl complete.
For example: The first crawl commences on Jan 1st and further crawls are scheduled daily. If a new table is added to the database on Jan 10th, OvalEdge will flag that a new table was created when it completes the next scheduled crawl on Jan 11th). 
  • The option to dismiss metadata based on regular expression (Regex) programming .