Micro-batching of streamed data

When you send data events with the Streaming Ingestion method, the micro-batching of data is automatically performed in Data Fabric and data is stored in Data Lake.

The batching involves grouping individual records into a larger data objects for the purposes of efficient data storage. In the case of Streaming Ingestion, the batching process is handled by Data Fabric. Handling of the batching process by the client application is redundant.

Data events are grouped based on their JSON message properties, such as objectName and fromLogicalId. The cut-off point is determined by the system for each data object based on a time window or the maximum uncompressed size of the object, whichever occurs first.

This table shows cut-off points in the micro-batching of a data object:


Cut-off point	Description	Value
Arrival time window	When the Streaming Ingestion service receives the first record or the next record after a data object has been stored, a time window is opened. In that time window, any incoming records that have the same metadata (message properties) are grouped into a single data object. After the window time frame expires, the data object is saved and stored in Data Lake. The arrival of a new record starts another time window and a new data object is created.	600 seconds
Data object size (Uncompressed)	When the total size of records within a data object reaches a predetermined file size of 5 MB, the data object is stored in Data Lake.	5242880 bytes