Variation handling

Variation handling is based on the Data Object’s metadata properties, defined in the additional properties for identifierpaths, variationpath, deleteindicator, and archiveindicator.

The identifierpaths property defines the property or properties, that comprise the primary key of a data object. For example, a Products data object may have Company and ProductID defined as the identifierpath properties, to signify that each product has a distinct company and product ID.

The variationpath is the property that defines a sequence structure for lower and higher variations. Lower variations signify earlier states, or versions of a record, and higher variations signify changes as updates are made to the record. Variations are generally integers. For example, the first version of a product is 1 when the order is created, 2 when the product is updated, 3 when the product is closed. Any update to the product triggers a new record, or variation, to be sent to Data Lake.

The deleteindicator is a property that signifies that the record is physically or logically deleted from the source. For example, if a product is deleted, the deleted flag is set to true.

The archiveindicator is a property that signifies that the record is archived at the source, the record is not active but retained for historic storage, legal or tax reasons.

Example: Products

This table shows the product records that are stored in Data Lake:

Company IdentifierPath ProductID IdentifierPath Description Price Variation VariationPath DeletedFlag DeleteIndicator ArchivedFlag ArchiveIndicator
001 991041 Bike 300 1 false false
001 991041 Bike 350 2 true false
002 222333 Scooter 129 1 false false
002 222333 Scooter 132 2 false false
003 12345 Skateboard 175 1 false false
003 12345 Skateboard 195 2 true false
003 12345 Skateboard 205 3 false false

The identifier path, or primary key, of the records is Company+ProductID. The Variation column is the variation, or version, associated with each primary key. Notice that Data Lake stores multiple variations, or versions, of each record, indicating the historic and current records. The DeletedFlag value of true or false indicates that a record was physically or logically deleted from the source, but the record exists in Data Lake.

The ArchivedFlag value of true indicates that the record was archived in the source, but the record exists as an archived record in Data Lake.

There are several methods to query Data Lake data for variations.

  • Select the maximum variation of each record and exclude deleted and archived records.
  • Select all variations of each record excluding archived records.
  • Select the maximum variation of each record and include records in which the maximum variation is deleted. This method excludes archived records.
  • Select the maximum variation of only archived records .
  • Select all variations of archived records.
  • Select the maximum variation of each record and include records in which the maximum variation is deleted or archived.
  • Select all variations of each record.