Micro-batching of streamed data

When you send data events with the Streaming Ingestion method, the micro-batching of data is automatically performed in Data Fabric and data is stored in Data Lake.

The batching involves grouping individual records into larger data objects for the purposes of efficient data storage. In the case of Streaming Ingestion, the batching process is handled by Data Fabric. Handling of the batching process by the client application is redundant.

Data events are grouped based on their JSON message properties of Streaming Ingestion, such as objectName, fromLogicalId, and sourcePublicationDate. Events are grouped into data objects at fixed intervals of 15 minutes according to the sourcePublicationDate property. Additionally, a cut-off point is determined by the system for each data object based on an arrival time window or the maximum uncompressed size of an object, whichever occurs first.

This table shows cut-off points in the micro-batching of a data object:


Cut-off point	Description	Value
Arrival time window	When the Streaming Ingestion service receives the first record or the next record after a data object has been stored, a time window is opened. In that time window, any incoming records that have the same Streaming Ingestion properties are grouped into a single data object. After the time window frame expires, the data object is stored in Data Lake. The arrival of a new record starts another time window and a new data object is created.	Estimated 300 seconds
Data object size (Uncompressed)	When the total size of records within a data object reaches a predetermined file size of 5 MB, the data object is stored in Data Lake.	Estimated 5242880 bytes