Guardrails

Guardrails are AI safety mechanisms that guide and control how large language models (LLMs) handle and generate content.

Their purpose is to ensure that AI interactions remain safe, ethical, and compliant with policy and regulatory requirements.

Rather than simply blocking responses, guardrails can filter, adapt, or flag content that falls into sensitive categories. This ensures that generated output is appropriate for the intended audience and context.

Currently, this guardrail type is supported:

  • Content Filtering: Blocks content in specific categories such as Hate, Insults, Sexual, Violence, and Misconduct.
  • Custom Guardrails: Admin-defined moderation criteria to block the content from being processed by the GenAI services.
Note: This is currently supported only in the /prompt and /message APIs.