Data streaming

The Data Streaming Ingestion method involves sending JSON messages in real time and acknowledging responses by indicating whether the messages were successfully received or failed.

This table shows the Streaming Ingestion properties to include in a JSON message:

Property Description
objectName A data object name from the Data Catalog.

The string can contain maximum 250 characters.

Cannot be a null or empty string.

correlationId GUID of a streamed JSON message that can be correlated in a response message.

The string can contain maximum 250 characters.

fromLogicalId An identifier of a source application in this format: lid://<provider>.<part2>.<part3>

Do not use infor in the logical ID fields.

payload A single, complete record that is encoded in Base64.

Cannot be a null or empty string.

Streaming Ingestion has these limitations:

  • A payload record must be in the Newline-delimited JSON (NDJSON) format.
  • The payload size cannot exceed 4.5 megabytes.
  • The maximum ingestion throughput rate is restricted to 100 messages per second. This is a limitation for the API Gateway streaming ingestion endpoint.

After you send a message to the Streaming Ingestion service, an acknowledgment response is expected. Only a positive acknowledgment response that contains an OK message guarantees that the message is eventually stored in Data Lake. You can use the correlationId property to match requests with their corresponding responses.

An acknowledgment response is sent for every message that was successfully received. If an acknowledgment response is not received within 5 minutes, we recommend that you resend the message.

The Streaming Ingestion service responds with an error result and error code if a message was not successfully received.

Note: Sometimes, the Streaming Ingestion service may not send an error response to a message that was not successfully received. Only a positive acknowledgment guarantees that the data is processed and stored in Data Lake.

This example shows a JSON message request to send to Streaming Ingestion:

{
    "objectName": "streamTest",
    "correlationId": "1625044086",
    "fromLogicalId": "lid://provider.myapplication.client1",
    "payload":"eyJuYW1lIjoiSm9obiIsICJhZ2UiOjMwLCAidmFyYXRpb24iOiAxfQ=="
}

This example shows a successful acknowledgment response from the Streaming Ingestion service:

{"result":"ok","correlationId":"1625044086"}

This example shows an error response from the Streaming Ingestion service:

{"result":"error","code":"InvalidMessageFormat","message":"Property payload is not base64 encoded.", "correlationId":"1625044086"}

The table shows possible errors and messages from the Streaming Ingestion service if an exception occurs:

Error Message
InvalidMessageFormat These messages are possible:
  • A required property is missing in the message
  • Payload property value is not base64 encoded
  • A property value exceeds the maximum length of 250 characters (objectName, fromLogicalId, correlationId)
  • fromLogicalId property is not in valid format
JSONDeserializationError Invalid JSON message format.
TooManyRequests The throughput rate limit was exceeded.
AcknowledgmentTimeout Your message cannot be acknowledged within the 55 seconds timeframe.
PayloadTooLarge Payload size exceeds the maximum size of 4718592 bytes
UnknownError Unknown Error

For optimal performance, we recommend that you implement a flow control mechanism to regulate the rate of streaming messages. With the flow control mechanism, you can maximize the number of messages that are processed per second and avoid receiving the TooManyRequests error. We also recommend that you prioritize maximizing message throughput and simultaneously maintain the system stability through implementing the mechanism.