Data Lake input functions
This list explains the functions:
- queryAll - Retrieve data objects (newline-delimited JSON) fromData Lake by filter
This function is defaulted with each new Data Lake Input step used and is considered the best practice for moving large data from the Data Lake payloads using a filter set against Data Lake properties. This function is the only function that adds the
dlDocumentDate
to the output properties and uses an in-transformation table input step to get the maxdlDocumentDate
from the on-premises table for incremental processing. The filter against the Data Lake properties and value that we store indlDocumentDate
property is using the Indexed Datedl_document_indexed_date
object property.Note: This functionality updated to use the Index Date is available only from the 2022-08 release of the ETL Client onward. Older versions of the ETL Client are using the Stored Datedl_document_date
. - query - Query objects from Data Lake Compass
This function is available for Data Lake extractions that require complex joins, filters, and column transformations. This function requires a second transformation to gain the max
timestamp path
value to include in the incremental processing.Note: The query function cannot be used with Infor Government Solutions (IGS) because it depends on an API that is not available in IGS. - queryAllCsv - Retrieve data objects (CSV) from Data Lake by filter
This function is specific to extracting CSV payloads from Data Lake. This function requires the use of the original QueryString utilizing the
DATALAKE_HOURS
environment variable set in thekettle.properties
for incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'
- queryAllOld (deprecated) - Query objects from Data Lake v1 payloads
This function uses
streambyid
APIs to extract payloads from Data Lake. This function requires the use of the originalQueryString
utilizing theDATALAKE_HOURS
environment variable set in thekettle.properties
for incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'
Note: As of the 2022.08 release, queryAllOld has been deprecated. Please use the queryAll option.