Troubleshoot Machine learning

This section describes troubleshooting machine learning.

Datasets

Use this troubleshooting information when creating and using Datasets.

UNABLE TO LOAD DATA from the Data Lake

Solution: Navigate to the Compass user interface in Data Lake and validate that the query returns data as expected. See the Compass documentation for additional information.

Warning related to MALFORMED ROWS when uploading file

Cause: This could mean that the datatype selected for a field does not conform on some rows or a row has an incorrect number of columns.

Solution: Validate the separated file before upload.

The DATE / TIMESTAMP DATATYPES from the dataset file are not recognized correctly in the metadata section

Solution: When uploading a dataset from file, specify a Date & Timestamp format of the respective dataset in the Details page.

Quests

Use this troubleshooting information when creating and using Quests.

The quest has a FAILED status

Solution: Identify the failing activity and check the error log. A log will be available to the user providing some information to pin point the issue.

The dataset contains MISSING VALUES

Cause: Most algorithms cannot process missing data.

Solution: Check the dataset and handle the missing values by applying the Handle Missing Data activity before training the model.

The dataset contains features with STRING data type

Cause: Most algorithms cannot process string values.

Solution: Check the dataset and convert the string features into numeric by applying the Index Data or One Hot Encoder activity before training the model.

The LABEL is NOT SPECIFIED before applying a supervised algorithm

Solution: Apply Edit Metadata activity and specify the label.

MULTIPLE columns are specified as LABELS

Cause: Only one label can be specified.

Solution: Apply Edit Metadata and confirm the label.

The Compare Model activity takes INCOMPARABLE MODELS as inputs

Cause: Models must be of the same type (either classification or regression type) to get comparable results.

Solution: Confirm the model type.

The SCRIPTING activity fails

Solution: Confirm that the output variable is defined in the code as an output of the activity.

Errors in the CONFIGURATION of the ALGORITHM hyperparameters

Solution: Make sure you have these values set correctly:

XGboost: num.class parameter =! unique number of classes
XGboost binary: classification objective on multiclass label column
Linear Learner - multiclass classifier: num.class parameter =! unique number of classes
Linear Learner - regressor: log not in [0,1] content label column

Training failed with NO ERROR LOG

Cause: For security reasons an error log could not be provided.

Solution: This may be an application error to be resolved.

The error log indicates that a TIMEOUT has occurred on the quest

Solution: Stopping and restarting the quest may resolve the issue.

WHICH ALGORITHM / OBJECTIVE to use

Solution: See the Algorithm Quick Reference in the main menu.

Long execution times

Use this troubleshooting information when experiencing long execution times.

There is a One-Hot Encoder activity present and running

Cause: The One-Hot Encoder results in too many columns if there is a high number of unique values. This may cause long computation times.

There is a user-defined SCRIPT or SQL activity present and running

Cause: The user has the flexibility to define a script or their own SQL which could result in long running activities.

The DATA volume is LARGE

Cause: Pre-processing and training on large datasets can take time.

Endpoints

Use this troubleshooting information when creating and using Endpoints.

UNABLE TO DEPLOY ENDPOINT - button is disabled

Cause: The production quest must be saved and run with a status value of Finished.

Solution: Save and run the production quest to enable the button.

ENDPOINT TESTING RETURNS AN ERROR

Solution:

Validate that the endpoint has a status value of Active.
Validate that the request payload is valid.
Validate that the request datatypes correspond to the endpoint schema.
If testing with .csv file input:
- Make sure the .csv file does not contain a header row.
- Validate that the request payload variables come in the same sequence as in the endpoint schema.
If testing with JSON input:
- Make sure there are no null values in the JSON.

INVOKING the endpoint for PREDICTIONS through API Gateway FAILS

Solution: Validate that the sequence of input values conforms to the schema definition.