Data Ledger
Data Ledger is a troubleshooting tool to indicate data alignment or data drift between applications that send data to Data Lake, and Data Lake itself.
The application periodically publishes a snapshot of information summarizing replication statistics. Depending on the ingestion method, this snapshot may include this information:
For data that is sent from the source application in batches:
- Data objects for a given data object name sent during defined interval
- Summarized instances across those data objects
For all ingestion methods:
- Records in a particular table at defined point of time
Based on the provided statistics, information is retrieved by Data Ledger about a data object from the source system and from Data Lake. According to this information, a potential match or mismatch between systems for objects that are sent in batches is reported by Data Ledger. If data has been sent record by record, then published reconciliation indicates an unknown classification, until you run a Compass query from Data Ledger.
You can find two types of information in Data Ledger, including three different measurements. This is published by an application that sends data to Data Lake and in Data Lake itself.
This table shows the types, measurements, and ingestion method for which a specific measurement is published:
Type | Measurement | Application (Sent) | Data Lake (Ingested) | Ingestion method |
---|---|---|---|---|
Flow | Data Object | The number of data objects sent during defined interval. | The number of data objects ingested during defined interval. | Batch API IMS |
Instance Count | The total number of instances for sent data objects. | The total number of instances for ingested data objects. | Batch API IMS | |
State | Row Count | The number of records with a specific table at defined time. | The number of unique records with specific data object at defined time. | Batch API IMS Streaming Ingestion |