Data preparation
Data must be transformed in a way that it can be understood and consumed by solver. The data generated in business applications is generally incomplete. It is lacking attribute values, lacking certain attributes of interest, contains only aggregate data, is noisy (contains errors or outliers), and is inconsistent (contains discrepancies in codes or names).
The data must be formatted, cleaned, and organized before feeding it into the model. Multiple datasets can also be prepared by joining two or more datasets together in same prepare data activity. Currently only 3 datasets can be joined and prepared together.
Answer this question: What are the inconsistencies and defects in the data that need to be resolved?
These are common pre-processing practices:
- Cleaning: assign or remove missing values, smooth noisy data, identify or remove outliers.
- Transformation: normalization and aggregation.
- Discretization: replace numerical attributes with nominal ones.
After each transformation you can save the resulting dataset to use it the next time you prepare data activity within the same design quest of the optimization module.