Knowledge Base

Setup Data Catalog

Relationships

Introduction to OvalEdge Relationships

Summary

Relationships help to combine data from two different tables. The relationship between tables is created using the primary and foreign key relationship that exists between the tables.

In OvalEdge, Relationships are calculated based on a connection between two tables that contain the same column information. OvalEdge supports one-to-one, one-to-many, and many-to-one relationships.

To get relationships, we need to crawl the schema through crawler settings by check-in the Relationship box, then we can see the PK (primary key) & FK (foreign key) relations in ER (Entity Relationship) diagrams and tabular view.

Types of Entity Relationship diagrams generated

1. Graphical View

Relationships displayed graphically at table-level.

Manual relationships & pattern relationships can be viewed in a graphical view. We can see all the relations of a particular table. Maximum of 5 relations can be seen in graphical view. If we have more than 5 relations, then it automatically can be seen in Tabular view.

2. Tabular View

Relationships displayed in tables & columns. Pattern relations are displayed in the tabular view. If we need to delete any pattern relations, then it must be deleted in both graphical & tabular views.

Understand ER diagram through various scores

In OvalEdge, we have two types of objects namely

Main Object:
Each dot in the relationship diagram represents a table. When a table (dot) is selected, it is displayed as the main object, and details about the related tables (dots) to this main object are displayed as the Linked object.
Example: Assume TABLE A is connected to TABLE B by a common column.
Linked Object: Each table (dot) connected to the main object is called a “linked object”. Details like similarity score and join score are displayed next to each linked object.

This gives a good understanding of how strong the relationship of this table is to the main object (Join score) and how many rows of data matches the main object (Similarity score).

Details of what data fields are matching between main and linked tables are provided when we hover over the similarity score of the linked objects. Similarly, details of how strong the relationship is between main and linked tables are provided when we hover over the join score of the linked objects.

Note: To discover the primary and foreign key relationships between the existing tables automatically and to build the entity-relationship (ER) diagram.

OvalEdge provides two main statistics for each relation that exists between the main and the linked objects:

1. Similarity Score

2. Join Score

Similarity Score

A similarity score is a measure that is calculated based on how many rows of data are matching between the two tables. A high similarity score means the two tables compared have more duplicated rows of data between them.

Join Score

When there is a strong relationship with the other table (Primary key-foreign key) then the join score is higher.

See this article How to build relationships? to know more about the steps involved in building relationships in OvalEdge.

The relationship window provides certain statistics to understand the strength and quality of the relationship that exists between the primary and secondary tables.

For example,

Table A: EMPLOYEE has unique rows that store all employee data.

Table C: EMPLOYEE_DEPT stores the department details associated with each employee.

The ID column from the EMPLOYEE table(primary key) is related to the EMPLOYEE_ID column in the secondary table. One employee belongs to multiple departments.

When a column from a primary table is related to a matching column in the secondary table, it is important to know the following scores.

Matching Count

In OvalEdge, the matching count displays the number of rows in the secondary column that have a matching value in the primary column with respect to the total rows in the secondary table including duplicate rows(if any). Additionally, it represents the percentage.

In the above example, all the records (Employee_ID) in Table C have matching records in Table A(ID). So 100% of records from Table C are matched.

Base Count

In OvalEdge, the base match count displays the number of row counts in the primary column that have a matching value in the secondary column.

In the above example, only one row (ID) in Table A have a matching record in Table C(Employee_ID)

Unique Matching Count

The unique match displays how many unique rows in the Employee_ID column have a matching record in the ID column of Table A.

If the Percentage of Unique matching count is 100%, it means all the rows have matches and are unique(no duplicates).
If the Percentage of the Unique matching count is 99%, it means 1% of the rows in the secondary table have duplicate rows.

In the above example, there is only 1 unique value EMPLOYEE_ID out of the total of 3 rows of data in Table C that has a matching value in Table A (ID).

i.e. 33%(⅓) of records have a unique match in Table C and 5% of records have a unique match in Table A.

Medium

This column information, denotes how the column relationship is created.

Manual: There are three ways when the column relationships are created manually. By adjusting relationships manually, adding any pattern relationship to actual column relationships and relationships calculated using advanced jobs.
Metadata Sync: When a PK-FK relationship is captured from the remote server during the crawling stage.
Query: If the query has JOIN conditions, then those joined columns are picked as relation.

For Example:

SELECT NAME, ADDRESS, MOBILE 

FROM EMPLOYEE E 

JOIN ADDRESS A ON A.EMPID = E.EMPID

Employee table's EMPID has relation with Address table's EMPID as these two columns are in join condition in query.

Status

The status of the column relationship is based on how the table columns are profiled.

Exact: When all the rows of the tables related in the column relation are fully profiled.
Approx.: When sample profiling is done on one or both of the tables in relation to i.e top 50 values that we get while profiling.
Not Calculated: Not calculated means, relationship counts/scores calculation has not yet been performed on that relationship.

Relation Score

Relationship score is a recommended score that displays the correlation strength that exists between two compared columns.

See this article How to calculate strength? to know more about calculating strength in OvalEdge.

Relation type

This displays the different types of relationships that exist between two entities. They are:

Relation Type = “1-1”
- If unique match count > 0 and
- unique match count == left match count == right match count
Relation Type = “1-N”
- If left match count == unique match count &&
- left match count<= left row count &&
- left match count > right match count
Relation Type = “N-1”
- If right match count == unique match count &&
- right match count <= right table row count &&
- right match count > left match count

If all the above conditions are not satisfied then Relation Type = “N-N”

Note: Relation Type = "UNDEF", when profiling is not done.

Algorithm Type

We use algorithm type in Entity-relationship(tabular view). Algorithm Type is used to know in which process, we built the relationship i.e. whether crawling job / advanced jobs / from patterns/manual etc.

User can identify each algorithm type as follows:

Building Relationships using	Algorithm Type
Discover Primary and foreign key relationships auto	PK & FK
Discover relationships automatically	Relationship Job
Get Relationships with Column Names	Column name
For pattern relationship	From Pattern
Schema crawling	Physical
Query Parsing	FromQuery
Manual building from UI	Manual