Apply algorithm activity catalog

After the dataset has been transformed using preprocessing activities, the next phase is the application of a machine learning algorithm. The input of the transformed dataset and the algorithm into Train Model activity produces the trained model.

Different algorithms accomplish different tasks. Algorithms examine the data and determine a model that is the closest fit to the data that is being reviewed.

The Coleman AI Algorithms activity catalog provides supervised and unsupervised algorithms:
  • Supervised algorithms are for regression, classification, or forecasting problems.
    • XGBoost
    • Linear Learner
    • Random Forest
    • Decision Tree
    • Extra Trees
    • Multilayer Perceptron
    • DeepAR: forecasting
  • Unsupervised algorithms are for cluster and associations analysis problems and dimensionality reduction.
    • K-Means
    • PCA: Principal Component Analysis
  • Custom algorithms are for packaging and deploying custom algorithm code to Coleman AI Platform and use the code for model training.

XG Boost

Use this gradient boosted trees algorithm to provide an accurate prediction of a target variable by combining the estimate of a set of simpler, weaker models. Additionally, it uses a gradient descent algorithm to minimize the loss when adding new models.

XGBoost minimizes a regularized objective function that combines a convex loss function, based on the difference between the predicted and target outputs, and a penalty term for model complexity. This is also referred to as the regression tree function.

Linear Learner

Use Linear Learner algorithm to explore a large number of models and choose the best model that optimizes either continuous objectives, such as mean square error, cross entropy loss, absolute error, or discrete objectives suited for classification, such as F1 measure, precision and recall, or accuracy.

When compared with solutions providing a solution to only continuous objectives, the implementation provides a significant increase in speed over naive hyper-parameter optimization techniques.

Random Forest

Use Random Forest algorithm to construct and combine multiple decision trees to provide a more accurate prediction. Unlike the decision tree algorithm, the Random Forest algorithm randomly selects observations and features and builds several decision trees before averaging the results.

Decision Tree

Use Decision Tree algorithm to continuously split the dataset according to a certain parameter, forming a decision tree. The tree has two main entities: decision nodes and leaves. The leaves are the outcomes, and the decision nodes are the points where the data is split.

Extra Trees

Extra Trees algorithm implements an estimator that fits many randomized decision trees, also called extra trees, on various sub-samples of the data set. This algorithm uses averaging to improve the predictive accuracy and control over-fitting.

Multilayer Perceptron

Use the Multilayer Perceptron algorithm (MLP) to train a set of input-output pairs to learn to model the correlation between them. Training involves adjusting the parameters to minimize error, and finding their correct balance to prevent model overfitting or underfitting.

The Multilayer Perceptron can be thought of as a deep artificial neural network. The perceptron’s input layer receives the signal, and the output layer decides or predicts the input. In between the input and the output layer, there are many hidden layers that are the true computational engine that combines the basic attributes into higher-level concepts.

DeepAR

Use DeepAR algorithm for training models with time-dependent patterns. The algorithm infers scalar (one-dimensional) time series using recurrent neural networks (RNN).

Use case examples include time series groupings for product demands, server loads, and webpage requests.

K-Means

Use the K-Means algorithm to find discrete groupings within data, where members of a group are as similar as possible to one another, and as different as possible from members of other groups. You define the attributes that you want the algorithm to use to determine similarity. It scales to massive datasets and delivers improvements in training time. The algorithm streams mini-batches, small, random subsets of the training data.

PCA (Principal Component Analysis)

Use PCA to reduce the dimensional number of features within a dataset and still retain as much information as possible.

Custom Algorithm

Custom algorithms that are packaged and deployed in the Custom Algorithms section are available in the Apply Algorithm activity catalog.

See Custom algorithms.