Data Lake input functions
This list explains the functions:
- queryAll - Retrieve data objects (newline-delimited JSON) fromData Lake by filter
This function is defaulted with each new Data Lake Input step used and is considered the best practice for moving large data from the Data Lake payloads using a filter set against Data Lake properties. This function is the only function that adds the
dlDocumentDateto the output properties and uses an in-transformation table input step to get the maxdlDocumentDatefrom the on-premises table for incremental processing. The filter against the Data Lake properties and value that we store indlDocumentDateproperty is using the Indexed Datedl_document_indexed_dateobject property.Note: This functionality updated to use the Index Date is available only from the 2022-08 release of the ETL Client onward. Older versions of the ETL Client are using the Stored Datedl_document_date. - query - Query objects from Data Lake Compass
This function is available for Data Lake extractions that require complex joins, filters, and column transformations. This function requires a second transformation to gain the max
timestamp pathvalue to include in the incremental processing.Note: The query function cannot be used with Infor Government Solutions (IGS) because it depends on an API that is not available in IGS. - queryAllCsv - Retrieve data objects (CSV) from Data Lake by filter
This function is specific to extracting CSV payloads from Data Lake. This function requires the use of the original QueryString utilizing the
DATALAKE_HOURSenvironment variable set in thekettle.propertiesfor incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})' - queryAllOld (deprecated) - Query objects from Data Lake v1 payloads
This function uses
streambyidAPIs to extract payloads from Data Lake. This function requires the use of the originalQueryStringutilizing theDATALAKE_HOURSenvironment variable set in thekettle.propertiesfor incremental processing.Query string example:
dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'Note: As of the 2022.08 release, queryAllOld has been deprecated. Please use the queryAll option.