Query Data Lake Activity

With Activity you can schedule receiving custom subset of your data from Data Lake, using graphical SQL modeler. You can select certain fields/columns, use filtering, join multiple objects/tables and so on.

  1. Select Connect > Data Flows.
  2. Click Add and select Data Lake Flow.
  3. Drag and drop the Query activity from the toolbar to the Data Lake flow.
    The Query activity can be placed only as a first activity in the flow.
  4. On the Properties tab, specify the name and, optionally, the description.
  5. On the Queries tab, click Add.
  6. In the Data Lake Query Modeler, you can run these substeps:
    1. Model a query in the same way as in AnySQL modeler.

      Only DSV and newline-delimited JSON Data Lake objects can be used in the modeler.

    2. In Settings, you can select a JSON newline-delimited or JSON conventional output format.

      Unlike in the regular AnySQL modeler you cannot select a specific column in the incremental configuration. The Data Lake storage time information is used automatically.

    3. Define the output format.
    4. Generate metadata for the output document.
    5. Save Modeler.
    6. Click BACK to return to the Data Lake flow.

      For more information about AnySQL modeling see the Infor ION Technology Connectors Administration Guide.

  7. Repeat steps 4 to 6 to add all required queries/documents.
  8. On the Scheduler tab, specify how often the modeled queries must be run.

    If more than 10 documents are defined in a single query Data Lake activity, then the first 10 are run on schedule. Other documents must wait until one of the slots becomes available.

  9. On the Filter tab, you can exclude data older than the specified date.

    This option works only when an incremental table is selected in the modeler.

    The incremental keys can be rewinded to the previous time point with the Rewind incremental option from Active Document flows page.

    For more information see Data Lake retrieval activities.

    You cannot rewind to the time point older than the limit specified on the Filter tab.