Data Lake input functions
This list explains the functions:
- queryAll - Query Object from Data Lake v1 streambyfilter
This function is defaulted with each new Data Lake Input step used and is considered the best practice for moving large data from the Data Lake payloads using a filter set against Data Lake properties. This function is the only function that adds the
dlDocumentDate
to the output properties and uses an in-transformation table input step to get the maxdlDocumentDate
from the on-premise table for incremental processing. - query - Query Object from Data Lake Compass v1 APIs
This function is available for Data Lake extractions that require complex joins, filters, and column transformations. This function requires a second transformation to gain the max
timestamp path
value to include in the incremental processing. - queryAllCsv – Query Object from Data Lake v1 payloads
This function is specific to extracting CSV payloads from Data Lake. This function requires the use of the original QueryString utilizing the
DATALAKE_HOURS
environment variable set in thekettle.properties
for incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'
- queryAllOld – Query Object from Data Lake v1 payloads
This function uses
streambyid
APIs to extract payloads from Data Lake. This function requires the use of the originalQueryString
utilizing theDATALAKE_HOURS
environment variable set in thekettle.properties
for incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'