Batch production

Batch production is used when working with larger datasets and the results are not expected in real-time. The results are ingested back to Data Lake.

Select the Batch Production tab to access the batch production quest canvas. This tab is active when you have exported one of the optimize model activity configurations from the design quest that you intend to use in the batch mode pipeline.

After the batch production quest is created, you can continue to modify the pipeline if necessary. When the batch quest is initially created, it is assumed that the single or multiple input datasets have the same schema as the single and multiple datasets in the corresponding Dataset Collection activity that is marked in the design quest. The same schema assumes the same variables of the same types. In batch production, scripting and Ingest to Data Lake activities can be edited. All other activities cannot be edited.

Single and multiple scripting or Ingest to Data Lake activities can be added. Scripting is used to format the end results as per the business need and using the ingest to data lake activity, writes it back to Data Lake.

After configuring the scripting activities, the batch production quest is saved and run. The successful completion of the gives the optimal results same as the design quest results. The output of the scripting activity can be used to define the schema in the data catalog. The same configuration is ingested back to Data Lake. For custom algorithms, scripting activity is optional. The output results of setup model can be directly ingested into Data Lake. The custom algorithm can generate single or multiple output files as per the definition of output files that would be provided in the custom algorithm code. A maximum of 10 files can be specified. Refer to the instructions available on the Custom Algorithm Detail page.