Best practices: Optimization
This is the optimization product life cycle:
Components overview
These are the components:
- Dataset collection: This is a collection of single and multiple datasets. A dataset contains the data that is used as input to the model in quest. It's used to feed single or multiple datasets to the model in a single go by tying the datasets together using group.
- Quests: A quest is the flow of activities that build the optimization model.
- Endpoints: An endpoint is the deployed model that provides optimal solution.
Business case definition
The first step towards a successful project is to define the business problem.
You must understand the domain and elements such as sets, indices, constants, decision variables, constraints and objective values for your project. Determine all the possible aspects that the project can provide.
- What is the current process related to the business case?
- Would optimization model specifically help with this issue? If so, how?
- Will the project contribute to these objectives?
- Save Time: time that could be re-invested into other initiatives.
- Save Money: improving accuracy and efficiency across industries. For example, an optimal solution use case can save money from minimizing inventory costs or better employee scheduling.
Fetching data
Single or multiple datasets are tied together and uploaded to the optimization quest.
A dataset contains the data that is used as input. A dataset can be uploaded from your local system or imported from Data Lake.
When uploading from a file, the datatypes in the metadata section are estimated automatically based on a sample of the data. Check the estimations and adjust accordingly if there are discrepancies.
There are cases when the Date/Timestamp datatypes from the file are not recognized correctly by the metadata guesser as there are too many format variants. For these cases, you can manually specify the Date/Timestamp format of the respective dataset.
A group must be created which consists of either single or multiple datasets to be fed to the optimization model. It is a required step. You can create the group by selecting the single or multiple datasets from the list of available datasets.
Data preparation
You must have a good understanding of the dataset structure.
Data must be transformed in a way that it can be understood and consumed by solver. The data generated in business applications is generally incomplete. It is lacking attribute values, lacking certain attributes of interest, contains only aggregate data, is noisy (contains errors or outliers), and is inconsistent (contains discrepancies in codes or names).
The data must be formatted, cleaned, and organized before feeding it into the model. Multiple datasets can also be prepared by joining two or more datasets together in same prepare data activity. Currently only 3 datasets can be joined and prepared together.
Answer this question: What are the inconsistencies and defects in the data that need to be resolved?
These are common pre-processing practices:
- Cleaning: assign or remove missing values, smooth noisy data, identify or remove outliers.
- Transformation: normalization and aggregation.
- Discretization: replace numerical attributes with nominal ones.
After each transformation you can save the resulting dataset to use it the next time you prepare data activity within the same design quest of the optimization module.
Design and test the model
Consider these questions when designing and testing the model. Use the Solver Quick Reference to assist in making these decisions.
- What is the model that you are building, linear or nonlinear?
- Is the mathematical formulation linear, linear mixed integer, nonlinear, nonlinear integer or nonlinear mixed integer?
- What solver must be applied based on problem type?
- Which modeling approach is suitable for your use case?
If the datasets are very large, the execution process may take a long time.
Design and tune the model
Generating the optimal solution from the model is a process that requires data, proper defining of the elements such as sets, indices, constants, decision variables, conditional constraints, objective functions, and a solver.
After you have an understanding of successful model structure and approaches for your problem, you can build the model and improve the performance by slightly modifying the elements within the boundaries of the business goal.
These are strategies to improve the performance of your optimization model:
- Review the data source. Is this the best data source for your model? Are there any drawbacks in the selected dataset? If not, clean the data or try to get cleaner data.
- Adjust the elements and the formulation of equations.
- Try different solvers and compare the results.
- Try different model setup configurations by changing the elements, equations and solver altogether.
Repeat the process until you reach a satisfactory level of performance.
Model deployment
After the model is built and the results of the solver is satisfactory then you can deploy that respective configuration to production.
You can choose where you want to deploy the model.
Model maintenance
Get feedback from real-world interactions and redefine the goals for the next iteration of deployment.
Eliminate unnecessary features. Regularly evaluate the effect of removing individual features from a given model, because unimportant features add noise to your feature space. A model's feature space should only contain relevant and important features for the given task.
Over time, as the input data distribution changes, the model's performance may weaken.