Deep Dive Articles

Curation by AI - A Deep Dive

The article will help the user understand how we have leveraged OpenAI functionality and Human Assistance to curate descriptions for data objects and business glossary terms. 

Human-assisted AI

Human-assisted AI, often called hybrid intelligence or human-in-the-loop AI, is a collaborative approach where human expertise and machine intelligence complement each other to achieve superior results. In this framework, humans provide guidance, feedback, and validation to AI systems, enhancing their capabilities and addressing their limitations.

Prompts to AI 

Prompts to AI are instructions or queries provided to artificial intelligence systems to generate specific outputs or responses. These prompts serve as input to AI models, guiding them to produce desired outcomes based on the context and requirements provided.

Problem Statement

OpenAI was integrated, enabling the generation of descriptions for Data Cataloged objects and Business Glossary Terms. Basic metadata information like domain names, categories, tables, and columns was provided to OpenAI. However, inaccuracies were noticed in the descriptions, especially with vague names such as "OE_Table."

Solution 

To enhance accuracy, we realized the need to provide OpenAI with more detailed information. Administrators can set up questions in AI Prompt that are visible to stewards, who can then assign these questions to relevant stakeholders for answers. These questions will help gather precise details for OpenAI to create better descriptions. After collecting all answers, they will be fed back into OpenAI to generate more accurate descriptions, aiding users in effectively managing data.

Curation by AI Questions

Curation by AI Questions are simple queries that gather specific details about data objects or business terms. These questions provide information that AI systems, like OpenAI, use to generate accurate descriptions. 

For example, a question might ask about the purpose of a data table or the category of a business term. Answering these questions improves the quality of the AI-generated descriptions, making data easier to understand and manage. The AI prompt questions repository is available on the AI Prompt Questions page. To view the questions, choose from the following:

  • Object Type 
  • Description Type
  • Connector or Global Type
  1. Object Type: Select the object type from the available list of object types,
    1. Schema
    2. Tables
    3. Table Columns
    4. Files
    5. File Columns
    6. Reports
    7. Report Columns
    8. APIs
    9. API Attributes
    10. Codes
    11. Business Glossary

      2

  2. Description type:
    1. Data Catalog
      If the Data Catalog Object types are chosen, the available description types are Business Description and Technical Description.
      3
    2. Business Glossary
      If the Business Glossary is chosen as an object type, the available description types are Business Description & Detailed Description.
      4
  3. View the Global or Connector-specific questions after selecting the Object Type and Description Type.
    5

Security

  • Users with Author license can view the Prompt questions for all object types and description types.
  • Only Super Admin (with the configuration ovaledge.role.admin) can add/edit/delete prompt questions from the AI Prompt questions page.

Adding Questions

  • After selecting the object and description types, questions can be added for all connectors or only for a specific connector.
  • Questions are limited to 1000 characters.
  • A maximum of 100 questions can be added at the Global or connector level.

6

Editing/Deleting Questions

Editing/Deleting Questions allows modifying or removing questions within the AI prompt. This feature refines the prompt process, ensuring questions capture the necessary information for generating descriptions. 

For example, if a question is unclear or redundant, edit or delete it to streamline the prompt experience. This feature enables continuous improvement by adjusting questions based on feedback and evolving requirements.

To edit or delete a question, select it and choose "edit question" from the 9 dots menu. A popup window will appear to edit the question. The edit option is disabled when multiple questions are selected.

7

Multiple questions can be selected and deleted at once. Deleting a question from the repository removes the question and its answers for all objects.

8

Generating Descriptions through AI

Object descriptions, such as business, technical, or detail descriptions, can be generated through Generative AI based on the answers provided to the prompt questions. Descriptions can also be generated for objects without answering the prompt questions, but the descriptions generated in this manner could be incomplete.

9

Different Objects that can be Curated through AI

The different objects that can be curated through AI are:

    1. Databases
    2. Tables
    3. Table Columns
    4. Files
    5. File Columns
    6. Reports
    7. Report Columns
    8. APIs
    9. API Attributes
    10. Codes
    11. Business Glossary

Security  

Following are the operations performed by users with respective permissions.

Operations

Meta-Read

Meta-Write

Governance Roles/Super Admins

View AI curation generator

No

Yes

Yes

Answer questions

No

Yes

Yes

Change Assignee

No

No

Yes

Delete questions

No

No

Yes

Add questions

No

No

Yes

Generate description

No

Yes

Yes

Questions from AI Prompt Questions

Users with MW permission on an object can generate the description through AI when editing it. Opening the AI description generator displays the top first  5 questions from the AI prompt questions page for the object by default.

Connector-specific questions are given priority over the Global questions.

10

Example

  • If there are 5 Connector-specific questions and 5 Global questions for a particular object type, only the Connector-specific questions are displayed by default.
  • If there are 3 Connector-specific questions and 5 Global questions for a particular object type, 3 Connector-specific questions and 2 Global questions are displayed by default.

Adding More Questions

Click the “Add more questions” button to add more prompt questions. Add remaining questions from the repository or new questions that are not available in the repository by clicking the “Create a New Question.” Custom questions added for an object are only for that object, not others. 

11

Delete Questions

Questions can be deleted for an object by clicking the three dots option next to the assignee. Deleting a question for an object removes it only from that object and not from the AI prompt questions page.

12

Generate Description

Users with Meta Write permission on the object can use AI to generate a description. The AI generates the description based on the Connector Name, Title, and prompt question answers given as input.

Descriptions can be generated through AI at any time. Answering all prompt questions is not necessary to generate a description.

13

Additional Prompt

After generating a description, enter additional prompts. The description will change based on the additional prompt provided. This helps the user to refine the description given by the AI. 

14

Regenerate Description

To regenerate the description, click the Regenerate button. The AI will generate a different description based on the input.

15

Answering Questions through the Question Wall

Curation by AI

Question Walls has a separate room for answering AI prompt questions - Curation by AI. 

Assigning a Prompt Question

Assigning a prompt question adds it to the “Curation by AI” room and the corresponding Wall based on the object type. This Room contains two walls:

16

17

  • Data Catalog
  • Business Glossary

Assigned questions appear in My Walls > Assigned. When a question is answered, the system automatically saves the response at the object level. The latest response to a question becomes the answer. 

To learn more about Question Walls, please refer to here.

Operations

Non-assignee

Assignee

Governance Roles/OE_Admin users

View AI curation questions

Yes

Yes

Yes

Answer questions

No

Yes

Yes

Change Assignee

No

Yes

Yes

Delete questions

No

Yes

Yes

Automatically Generate Description

AI automatically generates a description once all assigned questions for an object are answered in the Question Wall. This is triggered by a checkbox at the AI description generator for the object. If the checkbox is unchecked, automatic generation is disabled.

18

Notifications

Assigning a question triggers a notification for the recipient. Likewise, deleting a question prompts a notification to the assignee about its deletion.

System Settings

Key

Description

 

The token establishes a secure connection and enables the ChatGPT service. This key authenticates Ovaledge to integrate and communicate with ChatGPT.

Parameters:

The ChatGPT token must be entered in the specified field.

AI Prompt (Audit Trails) 

End-users are concerned about the data or metadata exposed to AI. An AI prompt in the Audit trail shows the metadata sent to OpenAI and the results received. AI Prompt audit trails also show the number of tokens consumed during OpenAI execution.

19

AI Engine: This column helps to understand which engine was used to generate the response. 

AI Model: This column helps to understand which AI Model was used to generate the response. 

Object Type: This column helps to understand the data object type for which the execution was triggered. 

Object Name: This column helps to understand the data object for which the execution was triggered. 

Field:  This column helps to understand the data object field (Business description / Technical description / Detailed description) where the execution was triggered. 

Prompt to AI: The "Prompt to AI" column shows the prompt sent to the AI model from OvalEdge. This section also shows metadata given as input to AI. 

Instruction: This column helps to understand the additional prompts and instructions given to the AI engine so that it can generate a better description. 

AI Response: This column helps to understand the engine's response after receiving the necessary inputs and prompts. 

Error message:  This column helps to understand and track if any error occurs during the execution of the description generation. 

Total Tokens: This column helps to understand the number of tokens consumed during the execution. 

Created By:  This column displays who triggered the execution. 

Created On: This column displays the date the execution was triggered.