Micro-batching of streamed data
When you send data events with the Streaming Ingestion method, the micro-batching of data is automatically performed in Data Fabric and data is stored in Data Lake.
The batching involves grouping individual records into larger data objects for the purposes of efficient data storage. In the case of Streaming Ingestion, the batching process is handled by Data Fabric. Handling of the batching process by the client application is redundant.
Data events are grouped based on their JSON message properties of Streaming Ingestion, such
as objectName
, fromLogicalId
,
and sourcePublicationDate
. Events are grouped into data
objects at fixed intervals of 15 minutes according to the sourcePublicationDate
property. Additionally, a cut-off point is determined by the
system for each data object based on an arrival time window or the maximum uncompressed size
of an object, whichever occurs first.
This table shows cut-off points in the micro-batching of a data object:
Cut-off point | Description | Value |
---|---|---|
Arrival time window | When the Streaming Ingestion service receives the first record or the next record
after a data object has been stored, a time window is opened. In that time window, any
incoming records that have the same Streaming Ingestion properties are grouped into a
single data object. After the time window frame expires, the data object is stored in Data Lake. The arrival of a new record starts another time window and a new data object is created. |
Estimated 300 seconds |
Data object size (Uncompressed) | When the total size of records within a data object reaches a predetermined file size of 5 MB, the data object is stored in Data Lake. | Estimated 5242880 bytes |