Contact Us 1-800-596-4880

Configuring Moderation Operations

Configure the [Toxicity] Detection by Text operation.

Configure the Toxicity Detection by Text Operation

The [Toxicity] Detection by Text operation classifies and scores any harmful content by the user or the LLM.

Apply the [Toxicity] Detection by Text operation in various scenarios, such as for:

  • Toxic Inputs Detection

    Detect and block toxic input by the user to prevent sending it to the LLM.

  • Harmful Responses Detection

    Filter out LLM responses that could be considered toxic or offensive by users.

To configure the [Toxicity] Detection by Text operation:

  1. Select the operation on the Anypoint Code Builder or Studio canvas.

  2. In the General properties tab for the operation, enter these values:

    • Text

      Text to check for harmful content.

This is the XML for this operation:

<ms-inference:toxicity-detection-text
  doc:name="Toxicity detection text"
  doc:id="b5770a5b-d3f9-47ba-acec-ab0bd41e4188"
  config-ref="OpenAIConfig">
    <ms-inference:text>
      <![CDATA[You are fat]]>
    </ms-inference:text>
</ms-inference:toxicity-detection-text>

Output Configuration

This operation responds with a JSON payload containing the toxicity detection and rating. This is an example response:

{
  "payload": {
    "flagged": true,
    "categories": [
      {
        "illicit/violent": 0.0000025466403947055455,
        "self-harm/instructions": 0.00023480495744356635,
        "harassment": 0.9798945372458964,
        "violence/graphic": 0.000005920916517463734,
        "illicit": 0.000013552078562406772,
        "self-harm/intent": 0.0002233150331012493,
        "hate/threatening": 0.0000012029639084557005,
        "sexual/minors": 0.0000024300240743279605,
        "harassment/threatening": 0.0007499928075102617,
        "hate": 0.00720390551996062,
        "self-harm": 0.0004822186797755494,
        "sexual": 0.00012644219446392274,
        "violence": 0.0004960569708019355
      }
    ]
  }
}
View on GitHub