Variation handling

Variation handling is based on the Data Object’s metadata properties, defined in the additional properties for identifierpaths, variationpath and deleteindicator.

The identifierpaths property defines the property or properties, that comprise the primary key of a data object. For example, a Products data object may have Company and ProductID defined as the identifierpath properties, to signify that each product has a distinct company and product ID.

The variationpath is the property that defines a sequence structure for lower and higher variations. Lower variations signify earlier states, or versions of a record, and higher variations signify changes as updates are made to the record. Variations are generally integers. For example, the first version of a product is 1 when the order is created, 2 when the product is updated, 3 when the product is closed. Any update to the product triggers a new record, or variation, to be sent to the Data Lake.

The deleteindicator is a property that signifies that the record is physically or logically deleted from the source. For example, if a product is deleted, the deleted flag is set to true.

Example: Products

This table shows the product records that are stored in the Data Lake:

Company IdentifierPath ProductID IdentifierPath Description Price Variation VariationPath DeletedFlag DeleteIndicator

001

991041

Bike

300

1

false

001

991041

Bike

350

2

true

002

222333

Scooter

129

1

false

002

222333

Scooter

132

2

false

003

12345

Skateboard

175

1

false

003

12345

Skateboard

195

2

true

003

12345

Skateboard

205

3

false

The identifier path, or primary key, of the records is Company+ProductID. The Variation column is the variation, or version, associated with each primary key. Notice that the Data Lake stores multiple variations, or versions, of each record, indicating the historic and current records. The DeletedFlag value of true or false indicates that a record was physically or logically deleted from the source, but the record exists in the Data Lake.

There are three methods to query Data Lake data for variations.

  • Select the maximum variation of each record and exclude deleted records.
  • Select all variations of each record.
  • Select the maximum variation of each record and include records in which the maximum variation is deleted.