Pentaho Connector

An out-of-the-box connector is available for Pentaho. It supports crawling datasets, that is, Dataflows, Datasets, and lineage building.

OvalEdge supports five types of Pentaho Integration. 

  • File Repository type
  • Server (API)  type
  • Repository (Database extract) type
  • GitLab
  • GitLab RestAPI

To Crawl and Build Lineage, currently, OvalEdge is ready with File and Server type and Gitlab and Gitlab RestAPI.

To work with the File Repository type, you need to specify a path to the Pentaho server file repository where the Pentaho files are located.

1-Nov-11-2022-10-48-37-4198-AM

User Permissions 

The following are the minimum permissions required for OvalEdge to validate the Pentaho connection. 

Permission: USAGE

Roles: Crawler Admin 

Super User: OE ADMIN

If it's a file path, the user needs to access that folder. 

If it's git lab, the user needs to have read access to the Pentaho files project. 

Technical Specifications

Crawling 

Feature

Supported Objects

Remarks

Crawling

Kept Pentaho Projects as Schema. Get the Job files and transformation files from the specified path.

Providing the files as datasets and source code

Lineage 

Lineage entities

Details

Table-File Lineage

Supported

File - Table Lineage

Supported

Column Lineage- File Column Lineage

Supported

Connection Details

The following connection settings should be added for connecting to a Pentaho database:

  1. log in to the OvalEdge application
  2. Navigate to Administration > Crawler module.
  3. click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed. 
  4. Select the connection type as PENTAHO. A pop-up window is displayed.


11-Nov-11-2022-11-17-33-0831-AM

If Crawl From is 

  1. The file system then needs to provide the File Path where Pentaho files are located.

       5-Nov-11-2022-11-29-20-9616-AM

2. GitLab or GitlabRestApi: needs to provide the following Details:
  • Gitlab username
  • GitLab password
  • Gitlab URL

                6-Nov-11-2022-11-30-59-4915-AM

3. The following are the field attributes required for the connection.

Property

Details

Connection Type

Pentaho

License Type

Standard, Lineage

Connection Name

Select a Connection name for the Pentaho database. You specify a reference name to identify your Pentaho database connection in OvalEdge easily. Example: Pentaho Connection1

Crawl from

1. File system(Need to provide the Path for Pentaho files)

2. Gitlab or GitlabRestApi(Need to provide the Gitlab Authentication Details)

GitLab Url

Database files URL (on-premises/cloud-based)

Path(if File System)

Path where the Pentaho files located 

Context params

Path of the folder with filename contextparams.txt inside the folder. This is to add the dynamic values from the file. 

Gitlab Username 

User account login credential (only for Pentaho Authentication 

Gitlab Password 

Password (only for <Pentaho> Authentication)

4. Once connectivity is established, crawling is enabled.

7-Nov-11-2022-11-34-51-6804-AM

5. Click on crawl and profile to get the project where the Pentaho files are located.

6. Select the required project or schema, then start crawling to get the Pentaho jobs and transformations.

How to Validate the Lineage

  1. If you click on lineage, we will get all the job files from Pentaho(ObjectType= Job).

   2. You need to select the required job to build the lineage for the selected Source code. If Lineage builds successfully, users get the lineage status as success lineageBuild in Lineage status.

 3. Then check out the dataset to which lineage is built by clicking on the dataset name. You will get all the Job Steps in the associations of the selected Job.

 

4. If you click on the associated object, which is Transformation(Associated Object Type = Transformation), you will redirect to transformation, where the actual lineage is built.


5. So, if you click on the Associations tab of a transformation, you will see all the steps.

 

6. So click on any associated object, a table or file, and then click on lineage; 

4-Nov-11-2022-11-24-09-6144-AM
7. Lineage is displayed. 2-Nov-11-2022-11-21-15-6977-AM