Data object processing concepts

Data Lake features an eventual-consistency model and reflects source system data at a previous point in time. Because of this, Data Lake may not always reflect the current state of data. To achieve a real-time data delivery in Data Lake, integrate Stream Pipelines into your solution architecture.

Compass SQL is the preferred approach to interact with data in Data Lake. However, you may need to retrieve an original, raw data object. To interact with Data Lake's storage layer directly, we recommend that you follow these best practices:

  • Use index timestamps, such as dl_document_indexed_date, for incremental retrieval patterns.
  • During a call, apply a 5-second lag interval from the highest indexed timestamp of data objects from the last data extraction. This is to ensure a comprehensive retrieval of data objects and limitation of incidents of data objects overlooking.
  • In an API call, sort data objects in an ascending order of an indexed timestamp to allow the last object's timestamp to be referenced.