An out-of-the-box connector is available for Pentaho. It supports crawling datasets, that is, Dataflows, Datasets, and lineage building.
OvalEdge supports five types of Pentaho Integration.
- File Repository type
- Server (API) type
- Repository (Database extract) type
- GitLab
- GitLab RestAPI
To Crawl and Build Lineage, currently, OvalEdge is ready with File and Server type and Gitlab and Gitlab RestAPI.
To work with the File Repository type, you need to specify a path to the Pentaho server file repository where the Pentaho files are located.
User Permissions
The following are the minimum permissions required for OvalEdge to validate the Pentaho connection.
Permission: USAGE Roles: Crawler Admin Super User: OE ADMIN If it's a file path, the user needs to access that folder. If it's git lab, the user needs to have read access to the Pentaho files project. |
Technical Specifications
Crawling
Feature |
Supported Objects |
Remarks |
---|---|---|
Crawling |
Kept Pentaho Projects as Schema. Get the Job files and transformation files from the specified path. |
Providing the files as datasets and source code |
Lineage
Lineage entities |
Details |
---|---|
Table-File Lineage |
Supported |
File - Table Lineage |
Supported |
Column Lineage- File Column Lineage |
Supported |
Connection Details
The following connection settings should be added for connecting to a Pentaho database:
- log in to the OvalEdge application
- Navigate to Administration > Crawler module.
- click on the + icon, and the Manage Connection with Search Connector pop-up window is displayed.
- Select the connection type as PENTAHO. A pop-up window is displayed.
If Crawl From is
- The file system then needs to provide the File Path where Pentaho files are located.
2. GitLab or GitlabRestApi: needs to provide the following Details:
- Gitlab username
- GitLab password
- Gitlab URL
3. The following are the field attributes required for the connection.
Property |
Details |
---|---|
Connection Type |
Pentaho |
License Type |
Standard, Lineage |
Connection Name |
Select a Connection name for the Pentaho database. You specify a reference name to identify your Pentaho database connection in OvalEdge easily. Example: Pentaho Connection1 |
Crawl from |
1. File system(Need to provide the Path for Pentaho files) 2. Gitlab or GitlabRestApi(Need to provide the Gitlab Authentication Details) |
GitLab Url |
Database files URL (on-premises/cloud-based) |
Path(if File System) |
Path where the Pentaho files located |
Context params |
Path of the folder with filename contextparams.txt inside the folder. This is to add the dynamic values from the file. |
Gitlab Username |
User account login credential (only for Pentaho Authentication |
Gitlab Password |
Password (only for <Pentaho> Authentication) |
4. Once connectivity is established, crawling is enabled.
5. Click on crawl and profile to get the project where the Pentaho files are located.6. Select the required project or schema, then start crawling to get the Pentaho jobs and transformations.
How to Validate the Lineage
- If you click on lineage, we will get all the job files from Pentaho(ObjectType= Job).
2. You need to select the required job to build the lineage for the selected Source code. If Lineage builds successfully, users get the lineage status as success lineageBuild in Lineage status.
3. Then check out the dataset to which lineage is built by clicking on the dataset name. You will get all the Job Steps in the associations of the selected Job.
4. If you click on the associated object, which is Transformation(Associated Object Type = Transformation), you will redirect to transformation, where the actual lineage is built.
5. So, if you click on the Associations tab of a transformation, you will see all the steps.
6. So click on any associated object, a table or file, and then click on lineage;
7. Lineage is displayed.