Data Lake input functions

There are four functions available in the Data Lake Input step.

This list explains the functions:

  • queryAll - Retrieve data objects (newline-delimited JSON) fromData Lake by filter

    This function is defaulted with each new Data Lake Input step used and is considered the best practice for moving large data from the Data Lake payloads using a filter set against Data Lake properties. This function is the only function that adds the dlDocumentDate to the output properties and uses an in-transformation table input step to get the max dlDocumentDate from the on-premises table for incremental processing. The filter against the Data Lake properties and value that we store in dlDocumentDate property is using the Indexed Date dl_document_indexed_date object property.

    Note: This functionality updated to use the Index Date is available only from the 2022-08 release of the ETL Client onward. Older versions of the ETL Client are using the Stored Date dl_document_date.
  • query - Query objects from Data Lake Compass

    This function is available for Data Lake extractions that require complex joins, filters, and column transformations. This function requires a second transformation to gain the max timestamp path value to include in the incremental processing.

    Note: The query function cannot be used with Infor Government Solutions (IGS) because it depends on an API that is not available in IGS.
  • queryAllCsv - Retrieve data objects (CSV) from Data Lake by filter

    This function is specific to extracting CSV payloads from Data Lake. This function requires the use of the original QueryString utilizing the DATALAKE_HOURS environment variable set in the kettle.properties for incremental processing.

    Query string example:

    dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'
  • queryAllOld (deprecated) - Query objects from Data Lake v1 payloads

    This function uses streambyid APIs to extract payloads from Data Lake. This function requires the use of the original QueryString utilizing the DATALAKE_HOURS environment variable set in the kettle.properties for incremental processing.

    Query string example:

    dl_document_date ge '$time.addHours(${DATALAKE_HOURS})'
    Note: As of the 2022.08 release, queryAllOld has been deprecated. Please use the queryAll option.