Micro-batching of streamed data
When you send data events with the Streaming Ingestion method, the micro-batching of data is automatically performed in Data Fabric and data is stored in Data Lake.
The batching involves grouping individual records into a larger data objects for the purposes of efficient data storage. In the case of Streaming Ingestion, the batching process is handled by Data Fabric. Handling of the batching process by the client application is redundant.
Data events are grouped based on their JSON message properties, such as
objectName
and fromLogicalId
. The cut-off point is
determined by the system for each data object based on a time window or the maximum
uncompressed size of the object, whichever occurs first.
This table shows cut-off points in the micro-batching of a data object:
Cut-off point | Description | Value |
---|---|---|
Arrival time window | When the Streaming Ingestion service receives the first record or the next record
after a data object has been stored, a time window is opened. In that time window, any
incoming records that have the same metadata (message properties) are grouped into a
single data object. After the window time frame expires, the data object is saved and stored in Data Lake. The arrival of a new record starts another time window and a new data object is created. |
600 seconds |
Data object size (Uncompressed) | When the total size of records within a data object reaches a predetermined file size of 5 MB, the data object is stored in Data Lake. | 5242880 bytes |