Anomaly Detection

Anomaly Detection plays a pivotal role in data processing by identifying and handling anomalies in time series and regression data. It leverages outlier detection techniques, including statistical methods, clustering algorithms, and machine learning approaches, to mitigate the impact of anomalies and improve forecast accuracy.

Anomaly Detection includes grouping features that let you organize data based on specific attributes. This makes it easier to detect unusual patterns within particular groups or categories. It is especially handy when analyzing trends across different segments.

On top of that, the explainability features help you understand what’s causing these anomalies, making the results clearer and easier to trust.

You can define these hyperparameters:

  • Input Keys
    • Specifies a comma-separated list of features or columns in the input data that must be included. These features are essential for the model to index and retrieve results and must be present in the input data. Ensure that the specified columns are correctly named and formatted in your dataset.

      Example: Date

  • Target Variable
    • Defines the name of the column in the original input data representing the target variable. This is the value you want to detect outliers for. Make sure the target variable is numeric and properly preprocessed to avoid any inconsistencies.

      Example: Price

  • Detection method
    • Indicates the method used for anomaly detection. It specifies the technique or algorithm employed to identify anomalies within the dataset. Choose the method that best suits the characteristics of your data and the type of anomalies you expect to detect.

      The available options are: two_sided_moving_median, one_sided_moving_median, distribution_based

      Example: two_sided_moving_median

  • User Tagged
    • Defines the name of the column that denotes whether the value is not an outlier. It serves as a flag indicating whether the data points have been tagged by the user as non-outliers. A value of 1 means this value is not an outlier. Ensure that this column is binary (0 or 1) and accurately reflects the tagging.
    • Example: user_tagged
  • Group By
    • Specifies a comma-separated list of fields used for grouping within the dataset. It determines the level at which we want to detect anomalies. Grouping helps in identifying anomalies within specific segments or categories of the data.

      Example: LocationId

  • Date Column
    • Specifies the name of the column in the input data that contains date information. It identifies the column representing dates within the dataset in YYYY-MM-DD format. Ensure that the date column is correctly formatted and free of missing values.

      Example: Date

  • Handling Method
    • Specifies the method used for handling anomalies within the dataset. It determines how anomalies are treated. Choose the method that aligns with your data handling strategy and the impact of anomalies on your analysis.

      The available options are: smooth, median, remove.

      Example: smooth

  • Advanced
    • Provides additional parameters for advanced usage of the engine. It is designed to make the interface easier to use so you do not need to wonder about parameters that can be set internally. It may include various settings or configurations tailored for specific use cases or scenarios, such as window size settings. This parameter is only used by two thirds of the detection methods thus it is not required all the time.

      Example: window_size:'14'.