Columnar replication validation
The replication validation functionality enables you to validate business class data replicated to columnar. The validation process compares what is in the columnar store with what is in the business class rowstore, and produces a report of mismatches.
Several restrictions and requirements apply to the validation of the columnar store data:
General requirements
- The validation is only available for columnar replication of a business class's current data.
- The replication being validated must be complete, not actively running or failed.
Condition requirements
If the replication set has a condition and contains references to fields other than persistent fields that are resident on the business class, such as derived fields, user fields, related fields, etc., overriding the replication set condition is required. It can be overridden to no condition or a condition that meets the requirements.
Checksum field requirements
Checksum fields are used for the initial checking for mismatches.
- Checksum fields must be numeric type fields.
- You can specify multiple checksum fields.
-
Checksum fields must either be the same field in the columnar table and the rowstore table or be a field in columnar that can be mapped to a delivered, persistent rowstore field. Conditions can be applied to the rowstore field matchup.
- Example: Columnar has derived fields ‘Credit’ and ‘Debit’ in the replication set. The derived fields are based on a persistent delivered field called ‘Amount’. The ‘Credit’ derived field returns ‘Amount’ when < 0. ‘Debit’ returns ‘Amount’ when > 0. You could add checksum fields for ‘Credit’ and ‘Debit’ and specify a rowstore field of ‘Amount' for each, adding a condition to each to apply to the Row Store field to make it match.
- Example: Columnar has the ‘Amount’ field. You could add two checksum fields, one with a condition < 0 and another > 0. The rowstore field must be persistent, delivered (meaning not a user field), cannot be a user field and cannot be a related field.
- If the checksum field can contain both negative and positive values, you need to add two fields, one with a condition < 0 and another > 0 (see above) since the values are added to together to compute the checksum value.
- Checksum field conditions can only be a single condition and contain only persistent fields that are resident on the business class. Related fields and user fields are not allowed.
Recommendations
- Infor recommends that the replication set business class is one that contains a
delivered (meaning not a user field) ‘Date’ field to use for date chunking. This should be
a field that is always populated (non-blank) and doesn’t change after replication.
- Since this field is used to compute date chunks, records with blank values will be excluded from validation.
- If this value is subject to change and replication is not up to date when the validation is run, this may cause false mismatches the validation process will not be able to detect.
- If you do not use a date field or the business class has no date field, you can still run the validation by selecting the Validate All Data option. However, the validation will fail if the number of records to process exceeds the setting for the maximum number of records to use for chunking (the default is 1.5 million).
- It is best to run the validation when the replication is up to date before other changes have been made. If the replication is not up to date, the validation will attempt to account for this where possible. This is dependent on checksum fields being audited but this also affects performance.
- It is preferred (but not required) that you run validations for business classes that have a unique ID field and that field is included in the replication set.
-
Validation is accomplished by first breaking data into date chunks so the number of records to be read and processed is limited to a manageable number. Dates will be grouped together in ranges up to the maximum record count. Smaller date ranges are suggested. The range is always the specified date, or computed date based on the offset option chosen, to current.
The date field used for chunking should be included in a non-virtual, non-conditional set. The process will attempt to find a set with this field to use for rowstore queries. If it is in more than one set, the process will select the set where this field is nearest the top.
Validation process details
- For each date chunk, an instance count and the checksum field aggregation values are retrieved from columnar. If any checksum values have conditions applied to columnar, additional aggregation calls are made for each unique condition.
- The rowstore is then queried.
- If the update stamp is after the validation start time and the business class is audited, then the record is looked up using the Object Persistence Framework (OPF) and the as-of date, so values and deleted state can be determined when possible. This has a performance impact. For this reason, Infor recommends that, when possible, validation be done shorty after a replication completes and before data changes are made. A few records would have minimal impact. Large volumes of changed records may have a significant impact.
- Checksum values are computed and the record's UID or keys are hashed. If all checksum values match, then the chunk is valid and the process moves to the next date chunk.
- If there are any mismatches, then it will query non-aggregated records from columnar. If it doesn’t find a match in the rowstore, a mismatch record is created. If a match is found and there were any conditional checksum values, it will look up the record using OPF with the validation start time as an as-of date.
- If there was a replication set condition and it was overridden, a check is done to see if the record would have been excluded had the condition been applied. If so, a note is made. Other checks are made for deleted state and notes made of any discrepancies. If an indicator is used, checks are done on that as well and discrepancies noted.
- Finally, the rowstore map is checked for any remaining records that had no columnar match. If the replication set has a condition and it has been overridden, the rowstore record will be looked up using OPF with the validation start time as an as-of date, and a check is done to see if the record would have been excluded had the condition been applied. If so, this is a false positive and the record is not considered a mismatch.