Deep Dive Articles

Global Search

Introduction

The Global Search is a search engine designed to discover search results for the entered keywords. This feature is located at the top of the OvalEdge application header which is conveniently accessible from any page within the OvalEdge.

When a keyword is entered, the search functionality retrieves relevant information based on the metadata within the OvalEdge platform. Users can search for data objects, including Tables, Files, Reports, Codes, APIs, Terms, Tags, Data Stories, Projects, Service Desk, and Inbox Notification messages.

Powered by Elasticsearch, a powerful search and analytics engine, OvalEdge’s Global Search utilizes a sophisticated search algorithm.  Based on the Relevance Score the algorithm takes into account multiple attributes, including Business/Technical Descriptions, object names, Terms, Tags, Synonyms, Custom fields, or Endorsement messages.  It supports advanced search functionality for users to further narrow down the search results and access pertinent information.

Searching in OvalEdge

The search results are shown based on how closely they match what the user is looking for, which is decided by specific rules behind the scenes. Let's examine the rules, conditions, and additional search tools such as filters and advanced search that help users narrow down their search results. 

Search engine parameters/logic

The different parameters/logic that are implemented into Global Search are:

  1. Inverted Index
  2. White Space tokenizer
  3. Normalization
  4. Special characters
  5. Logical Operators
  6. Relevance Score

Inverted Index - Exact Match

An inverted index is a fundamental data structure used in Elasticsearch. It serves as an efficient mechanism for indexing and searching textual data.

In OvalEdge, the metadata information that is crawled from the source systems is stored in the form of documents. The documents are indexed, and their textual content is analyzed and split into individual terms or tokens. These terms represent the different words or terms within the document.

The inverted index is created by taking all the unique terms from the indexed documents and organizing them in a sorted manner. For each term, the inverted index maintains a list of documents that contain that term. This list is often referred to as a posting list.

For example, consider the following three documents:

  • Document 1: "My cat’s name is Jeffry"
  • Document 2: "Harry is late for the party"
  • Document 3: "Where is the party"

The inverted index for these documents is shown in the below image.

  • Terms represent the individual words or tokens in the documents.
  • Frequency denotes how many times a term appears within a document. 
  • Documents refer to the pieces of content being indexed and associated with the terms they contain. 

These components work together in the inverted index to facilitate efficient searching and retrieval of relevant information. When a search is executed with the term ‘party’, Elasticsearch looks up all the documents that have these terms and retrieves the relevant documents (2 & 3 documents) associated with those terms. 

By utilizing the inverted index, Elasticsearch can find exact matches in a search term by locating documents that contain the search term without having to scan every document in the entire dataset. 

White space tokenizer

The global search uses a whitespace analyzer. The whitespace analyzer uses a tokenizer to divide strings (input text or data that undergoes analysis, usually search keywords) based on multiple spaces present within them. When text is analyzed using the whitespace analyzer, it treats spaces as significant separators between words or chunks of text. This means that each space encountered is considered a delimiter (space)  that separates one token from another. 

Suppose we have the following input text: "customer table inventory works’, then this text goes through tokenization using a whitespace-based tokenizer. Here's how the tokenizer would tokenize the example text:

  1. "customer" - The tokenizer recognizes "customer" as the first token, as it encounters a space after it.
  2. "table" - The space after "customer" is considered as a token itself. Therefore, "table" is the second token.
  3. "inventory" - The space after "table," is a token, and "inventory" is the next token.
  4. "works" - Following the space after "inventory," "works" is the subsequent token.

In this example, the tokenizer treats the spaces as individual tokens in addition to the words. This allows the system to recognize and work with the spaces explicitly. The resulting tokens would be: "customer" "table" "inventory" and "works". These tokens can then be used for indexing, searching, or further analysis.

By considering spaces as tokens, the tokenizer enables more granular control over the search text analysis process, facilitating operations such as preserving multiple spaces, exact phrase matching, or handling spacing-related patterns in the data. By doing so, the whitespace analyzer enables more precise analysis and processing of text, particularly in cases where the spacing between words holds importance.

Normalization

Normalization transforms tokens into a consistent format for better matching. Common normalization techniques include converting all tokens to lowercase or removing punctuation or diacritics. The search is case-insensitive, considering both uppercase and lowercase letters the same, ensuring uniform results without regard to letter casing.

Special characters

Global Search can find text with  special characters. For instance, if a table is named "Customer_Orders", searching with the exact keyword will fetch search results having exact matches. However, excluding special characters, for instance, typing "CustomerOrders" without the underscore, may result in different search results. 

Logical Operators

The Global search filter allows you to search for exact keywords by using qualifiers. 

Search Qualifier

Character Used

Description

AND

AND or & or && or +

Let's say you're looking for data objects related to "sales" and "marketing". To view search results for objects containing both keywords, you could enter "sales && marketing" or "sales + marketing" in the search field. This would return results for data objects that contain both the "sales" and "marketing" keywords.

OR

OR or | or ||

If you're interested in results for either "sales" or "marketing", you could enter "sales || marketing" or "sales ||| marketing" in the search field. This would show results for data objects containing either the "sales" or "marketing" keyword.

Users can fetch accurate search results for the keyword entered by including qualifiers (& for AND, II for OR) in the search bar.

Recent Search

Clicking on the OvalEdge search bar displays the list of the five last searched items of the individual user, allowing users to quickly revisit and access previously searched items. Selecting any of these recent searches directly navigates to the corresponding item's page.

Quick Search

When text is entered into the search bar, up to five best matches are prompted based on the relevance score, showcasing up to five immediate matches. The search process becomes quicker reducing the need for extensive typing. This can be helpful for users who may not be certain about the specific keywords or terms to use.

Searching for specific object types

  • Users can utilize the "All" dropdown icon or the separate tabs across the top of the global search result page to refine search results based on the object type. 
  • After the initial search results are presented on the global search page, users have the option to filter results by object type located at the top of the results. By selecting tabs corresponding to different object types, users can efficiently narrow down and focus on the specific type of objects they are interested in.


The different Object types that can be chosen are listed below:

Data Catalog

  1. Schema/Databases
  2. Tables
  3. Table Columns
  4. Files
  5. File Columns
  6. Reports
  7. Report Columns
  8. Codes
  9. API
  10. API Attributes

Governance Catalog

  1. Business Glossary Terms
  2. Tags

Others

  1. Data Stories
  2. Projects
  3. Service Requests
  4. Inbox

For Service Requests, only requests that are created and in the approval phase will be displayed in the search results.

Relevance Score

As previously mentioned, search results are displayed based on the most relevant items showing up first, determined by Relevance Score. The item that matches the search term the best gets the highest relevance score. 

The relevance score is a combination of three different scores: the ElasticSearch Score, the Synonym Score, and the Popularity Score.

The ElasticSearch Score is always factored into the relevance score. However, whether the Synonym and Popularity scores affect the relevance score is something that can be adjusted in the System Settings. If Synonym and Popularity scores are included, they are part of the relevance score calculation.

The formula of the Relevance score(if Synonym and Popularity are considered) is

Relevance Score = (Elasticsearch Score + Synonym Score) * Popularity Score.


Synonym score calculation

The Synonym Score is for items that have “configure search keywords” set up. For instance, if there's a table named "Warehouse" but it's also set up with the keyword "Inventory" when users search for "Inventory", the "Warehouse" table will show up in the results. This happens even if "Inventory" isn't mentioned in the table's description or anywhere else, all because of the configured search keyword on that data object.

  • When a governance role user creates a configured search keyword, the synonym score for that word is 3. If any other user creates a configured search keyword, the synonym score for that word is only 1.
  • Similarly, when a governance role user upvotes a configured search keyword, the synonym score increases by 3 for each vote.
  • If other users upvote a configured search keyword, the synonym score only increases by 1.

Popularity Score 

The Popularity Score displays the number of times the users interacted with this data object by viewing, endorsing, commenting, adding tags, or querying the data object. 


The total number of views shows how popular the data asset is relative to other assets in the application. The higher the popularity score, the more the data asset becomes popular for recommendations. 

Popularity scores are calculated based on the following actions:

  • Views a data object (Schema/table/table column/file/file column/Report/Report column/Codes) - the score increases by 1.
  • Endorses a data object - the score increases by rating.
  • Comments on a data object - the score increases by 5.
  • Change the Wiki of an object - the score increases by 3.
  • Tags a data object - the score increases by 4.
  • Queries a data object - the score increases by 3.

ElasticSearch score variations:

When the keyword “customer order” is searched,

  1. ES scores of “customer order” > (“customer” = “order”)
  2. Items titled "customer order" get a higher ElasticSearch score compared to those where "customer order" is just mentioned in their business or technical descriptions. Moreover, items that have keywords matching their business or technical descriptions rank higher than those with keywords in their tags, terms, custom fields, or configured search keywords.

Viewing Search Results

Upon searching for keywords and pressing  “Enter”, the system displays suggested results based on the relevance scores. Each displayed item includes the Object Name, Object Title, Technical Description, and Business Description (if applicable).

Additionally, for every object shown in the search results, users can view Certifications (if applied), Endorsement Rating (if applied), and Relevance Scores, conveniently located at the right end of the object listing. This provides users with comprehensive information about each result, allowing for better-informed decision-making during the search process.

Actions Performed on Search Results

Raise a Service Request

After search results are displayed, if the user does not find the object they were looking for, they can click on the “Raise a SR” button, which will direct them to the Service Desk page, where they can raise a service request.

Question Wall

After search results are displayed, if the user does not find the object they were looking for, they can click on the “Ask a Question” button, which will direct them to the Question Wall. The Question Wall is a space within the My Resources module which allows, users to ask questions or bring up issues by mentioning (using the @ symbol in OvalEdge) the user directly responsible, who can help address the concern(s).

Quick view objects

After search results are displayed when users hover over a Data Catalog or Business Glossary object, the user will see a ‘Quick View’ icon. The Quick View allows the user to obtain a summary of metadata details for a search item or data object without having to navigate deep into the full object’s information. 

Add to Projects/ Access Cart

After the search results are displayed when the user hovers over a Data Catalog or Business Glossary object, they will either see the ‘Add to Projects’ icon or the ‘Add to Access Cart’ icon. After clicking on the icon, the object will be added to the Default Project or Access Cart.  

Navigate to the Object/item summary page 

If a user locates the data object searched  for, they can click on it to navigate and explore more details within the search results. Users can delve deeper into the object, for comprehensive details and information on the data object’s summary page.

Left Panel Filter for different object types

Based on the selected data object type, the left panel options are displayed to further refine the search results based on attributes. These attributes include connection names (for data objects), domains (for business glossary terms) associated tags, endorsement ratings, popularity scores, etc. 

By choosing the relevant checkboxes, and selecting the “Apply Filter” button, the search results become more specific. This selection process allows users to tailor their search criteria, resulting in more precise outcomes that align with their preferences or requirements.

For instance, if users opt for the "Certified" filter, only items certified as "Certify" will be shown. To remove applied filters, users can click on "Clear All."

This tabular format organizes the information, making it easier to understand the common filters available for each object type and any additional filters specific to certain types.

Object Type

Common Filters Available 

Data Catalog Objects (Tables, Table Columns, Files, File Columns, Reports, Report Columns, Codes, API, API Attributes)

Certification types

Connection Name

Schema name

Owner, Steward, Custodian, Other Governance Roles

Status

Terms name

Endorsement Rating

Tags name

Types

Popularity

Additional Fields based on code custom fields

Business Glossary Terms

Domain

Category

Owner, Steward, Custodian

Status

Tags name

Endorsement Rating

Popularity

Additional Fields based on code custom fields

Tags

Created By

Create Date

Additional Fields based on code custom fields

Projects

Created By

Created date

Data Stories

Created By

Endorsement rating

Story Zone

Tags name

Service Requests 

Different Approval Status

Different fulfillment status

Created date

(Note: For Object types such as Tables, table columns, files, file columns, reports, report columns, API, API Attributes there are two additional filters:

  • Certification types and Schema name

Advanced Search

Advanced Search filter is enabled within every object type results page in the top-right corner. Users can do a detailed search by applying additional filters on top of the initial search results. The Advanced Filter significantly enhances search precision and enables users to narrow down their results to find the most relevant data objects.

(Note: Users will not see the Advanced Search icon when the results show everything.) 

When performing an advanced search, the users can choose whether the search results should contain keywords:

  1. Exactly the same as given in Advanced Search
  2. Starts with the given keyword given in Advanced Search
  3. Ends with the given keyword in Advanced Search.

Advanced Filter Options

Object Type

Common Filters Available 

Data Catalog Objects (Tables, Table Columns, Files, File Columns, Reports, Report Columns, API, API Attributes, Codes)

Connection Name

Schema Name / File Folder / Report Name 

Owner

Steward

Custodian

Governance Roles

Tags name

Terms Name

Search Keyword

Business Description

Technical Description

Title



Certification

Table/Report/File name

Column Name

Business Glossary Terms

Term Name

Domain Name

Category Name

Sub Category Name

Owner

Steward

Custodian

Governance Roles

Tags Name

Search Keyword

Detail Description

Business Description

Tags

Tag Name

Search Keyword

Tag Description

Projects

Project Name

Search Keyword

Project Description

(Note: Advanced Search options are not available for Data Stories, Service Requests and Inbox.)


In the Global Search, users can perform a detailed search using Advanced Search, which goes beyond filters. Users can only use Advanced Search for certain object types, and not for all search results. When users search for a specific object type, the Advanced Search icon will show up on the right side of the search bar.

When performing an advanced search, the users can choose whether the search results should contain keywords:

  1. Exactly the same as given in Advanced Search
  2. Starts with the given keyword given in Advanced Search
  3. Ends with the given keyword in Advanced Search.

System Settings

The overall purpose of system settings for global search is to provide administrators and users with the flexibility to configure the behavior and display of the global search functionality. These settings empower users to tailor their search experience, control the visibility of certain features, and fine-tune the search parameters according to their specific requirements.  These settings can be configured from Administration> System Settings > Others tab.

Key

Description

globalsearch.objectscore.display

Show/hide the relevance score for data objects in the Global Search results.

Parameters:

The default value is False.

If set to true, the relevance score is displayed in search results.

If set to false, the relevance score will not be displayed.

globalsearch.es.objecttabs.display

Show/hide the different objects tabs (Tables/Files/Reports/Business Glossary) in the Global search page.

Parameters:

The default value is set to "ALL,glossary,oetable,oetag,oeschema,oecolumn,oefile,oefilecolumn,oechart,chartchild,oestory,oequery,project'.

Enter the selected object names in the field provided. If Left blank then only the All tab is displayed.

globalsearch.es.pagination.count

Specify the maximum number of records to be displayed on a page while running the advanced job to fetch from local db to elastic search. Supports: Min - 3000, Max - 8000)

Parameters:

The default value is 10000.

Enter the value in the field provided.

globalsearch.es.activedetails

Show/hide the active/inactive objects in the global search results.

Parameters:

The default value is Active.

  • If set to Active, then only active objects are displayed in the global search results.
  • If set to Inactive, then both inactive objects can be viewed in the global search results.
  • If the field is empty, then both the active and inactive data object results are displayed in the Global search results.

globalsearch.es.datamovement.display

To enable search results for source codes (queries) in Global Search results.

Parameters:

The default value is set to false

If set to true, the search results for codes are displayed.

If set to False, the search results are not displayed for codes.


elastic.search.highlight.fragments

Configure the number of highlighted values shown in the search results for a search keyword entered.

Parameters:

The default value is 15.

Enter any value in the field provided.


oe.globalsearch.searchorder

Control the display of tabs in the Global Search by enabling or disabling them. This applies to both the tabs across the top of the page and the filter left panel.

Parameters:

Enter the tab names in the field provided.

All-ALL, Databases - schema, Tables - oetable, Tags - oetag, Table Columns - one column, Files - oefile, File Columns - oefilecolumn, Reports - oechart, Report Columns - chartchild, Codes - oequery, Business Glossary - glossary, Data Stories - oestory, Projects - project, Service Requests - servicedesk


globalsearch.fulltext.search

To search for full-text information with or without highlights based on the configured character size.

Parameters:

If set to True, the full-text search will be performed without highlights.

If set to False, the search will consider a character length of 50,000. It will perform the search using highlights for the initial portion of the text and display the search results with highlights. The remaining portion of the text, it will display the search results without highlights.


globalsearch.max.analyzed.offset

To configure the maximum limit for the preferred value for highlights. Adjusting this value allows users to control the extent of text that can be visually emphasized for easier identification or reference.

Parameters:

The default setting is 200,000 characters.

Enter the desired value in the field provided.

globalsearch.score.use.synonym

Configure the Synonym (Configure Search Keyword) Score in the Relevance score formula to determine the most relevant search results. The relevance score is calculated based on three components: the Elasticsearch score, the popularity score, and the synonym score (if configured).

Parameters:

If set to True, the search results calculation includes the Synonym score.

If set to False, the search results calculation excludes the Synonym score.  Relevance score calculation depends solely on the Elasticsearch score and the settings configured for the Popularity score.


globalsearch.score.use.popularity

Configure the Popularity Score in the Relevance score formula to determine the most relevant search results. The relevance score is calculated based on three components: the Elasticsearch score, the popularity score, and the synonym score (if configured).

Parameters:

If set to True, the search results calculation includes the popularity score.

If set to False, the search results calculation excludes the popularity score. The relevance score calculation depends solely on the Elasticsearch score and the settings configured for synonym score.


globalSearch.objectMatch.weightage

Adjust how search results are prioritized and displayed by considering matching search keywords in the names and titles of data objects.

 This adjustment depends on the "globalSearch.objectDescriptionsMatch.weightage" and "globalSearch.objectOtherAttributesMatch.weightage" system settings, influencing results related to descriptions and other attributes respectively. The displayed search results are determined by the greatest value among the three specified system settings.

Parameters:

The default weight is set at 2.

Enter the value in the field provided.


globalSearch.objectDescriptionsMatch.weightage

Adjust how search results are prioritized and displayed by considering matching search keywords in the descriptions (Business, Technical and Source Descriptions) of data objects. This adjustment depends on the "globalSearch.objectMatch.weightage" and "globalSearch.objectOtherAttributesMatch.weightage" system settings, influencing results related to Names/Title and other attributes respectively. The displayed search results are determined by the greatest value among the three specified system settings.

Parameters:

The default weight is set at 3.

Enter the value in the field provided.


globalSearch.objectOtherAttributesMatch.weightage

Adjust how search results are prioritized and displayed by considering other attributes of data objects (Tags, Terms, Custom Fields, etc). This adjustment depends on the "globalSearch.objectMatch.weightage" and "globalSearch.objectOtherAttributesMatch.weightage" system settings, influencing results related to Names/Title and other attributes respectively. The displayed search results are determined by the greatest value among the three specified system settings.

Parameters:

The default weight is set at 3.

Enter the value in the field provided.

Advanced Jobs


The OvalEdge System Administrator is required to execute an Advanced Job named "Advanced Job for Indexing into ElasticSearch" during the initial installation of the tool. Subsequently, as new data is crawled into the system, the indexing into Elasticsearch is automated.