Data Ledger
Data Ledger is a troubleshooting tool to indicate data alignment or data drift between applications that send data to Data Lake, and Data Lake itself.
The application periodically publishes a snapshot of information summarizing replication statistics. Depending on the ingestion method, this snapshot may include this information:
- For data that is sent from the source application in batches:
- Data objects for a given data object name sent during a defined interval
- Summarized instances across those data objects
- For streamed data that is sent from the source application:
- Summarized instances that were streamed for an object during a defined interval
- Total checksum calculation for all instances that were streamed for an object during a defined interval
- For all ingestion methods:
- Records in a particular table at a defined point of time
Based on the provided statistics, information is retrieved by Data Ledger about a data object from the source system and from Data Lake. According to this information, a potential match or mismatch between systems is reported by Data Ledger.
You can find two types of information in Data Ledger, including four different measurements. This is published by an application that sends data to Data Lake and in Data Lake itself.
This table shows the types, measurements, and ingestion method for which a specific measurement is published:
Type | Measurement | Application (Sent) | Data Lake (Ingested) | Ingestion method |
---|---|---|---|---|
Flow | Data Object | The number of data objects sent during a defined interval. | The number of data objects ingested during defined interval. | Batch API IMS |
Instance Count | The total number of instances sent during a defined interval. | The total number of instances ingested during a defined interval. | Batch API IMS Streaming Ingestion |
|
Checksum | Summarized checksum of instances sent during a defined interval. | Summarized checksum of instances stored in Data Lake during a defined interval. | Streaming Ingestion | |
State | Row Count | The number of records with a specific table at the defined time. | The number of unique records with specific data object at the defined time. | Batch API IMS Streaming Ingestion |