pasobgiga.blogg.se

Pentaho data integration api
Pentaho data integration api









This requires a valid database connection and introduces extra latency into the lineage collection. In the case of a Table Input step, the query is actually executed so the metadata of the ResultSet is available to determine the output fields.

pentaho data integration api

So the step in turn calls the previous steps' getStepFields() methods. This lets the step report its own outgoing fields, and many times the step needs to know the incoming fields before it can properly report the output fields. For example, getting the list of fields that are output from a transformation step involves calling the getStepFields() API method.

pentaho data integration api

This includes APIs that are more general than uniform, as flexibility has seemed a more important goal than introspection. However PDI is a very flexible and powerful tool. It may seem like the fields, steps, and operations are readily available such that the relationships could easily be discovered. Whether that vision is realized the same way depends on the roadmap, it is very possible the needs of Pentaho's customers will drive the data lineage capabilities in a different direction.Ĭollecting lineage information for PDI is non-trivial. The term was a pet name for what we envisioned the end product to be, a universe of metadata and relationships between all the artifacts and concepts in the Pentaho stack.

#Pentaho data integration api code#

You may see the term "metaverse" listed throughout the code and documentation (including the project name itself). The code for the current data lineage capability is entirely open source and is available on GitHub here. So in that sense you can follow your data all the way through your PDI process.

pentaho data integration api

When jobs or transformations call other jobs or transformations, that relationship is also captured. over the course of a transformation or job. Basically we keep track of all fields as they are created, split, aggregated, transformed, etc. Data lineage is an oft-overloaded term, but for the purposes of this blog I will be talking about the flow of data from external sources into steps/entries and possibly out to other external targets. In Pentaho Data Integration 6.0, we released a great new capability to collect data lineage for PDI transformations and jobs.









Pentaho data integration api