Content Moderation APIs
When and how to use OpenAI Moderation, Azure Content Safety, and AWS Comprehend for AI output screening. Includes Python integration examples and a cost/latency comparison.
Why Use Moderation APIs?
Building a custom safety classifier requires data, training, and maintenance. For most use cases, a moderation API gives you a production-ready classifier in one API call.
Use moderation APIs for:
- Screening user inputs before they reach your LLM
- Screening LLM outputs before they reach users
- Fast iteration without a machine learning team
OpenAI Moderation API
The OpenAI Moderation API is free, fast, and catches most common categories of harmful content.
Categories detected:
harassmentandharassment/threateninghateandhate/threateningself-harmandself-harm/intentandself-harm/instructionssexualandsexual/minorsviolenceandviolence/graphicillicitandillicit/violent
# moderation/openai_mod.py
from openai import AsyncAzureOpenAI
from dataclasses import dataclass
@dataclass
class ModerationResult:
flagged: bool
categories: list[str]
scores: dict[str, float]
async def moderate_with_openai(
text: str,
client: AsyncAzureOpenAI,
threshold: float = 0.5,
) -> ModerationResult:
"""Check text against OpenAI content policy categories."""
response = await client.moderations.create(
input=text,
model="omni-moderation-latest",
)
result = response.results[0]
# Extract flagged categories above threshold
scores = result.category_scores.model_dump()
flagged_categories = [
category
for category, score in scores.items()
if score >= threshold
]
return ModerationResult(
flagged=result.flagged or len(flagged_categories) > 0,
categories=flagged_categories,
scores=scores,
)
# Example usage:
async def check_user_input(text: str, client) -> bool:
result = await moderate_with_openai(text, client)
if result.flagged:
return False # Block this input
return True
# Latency: 100-200ms
# Cost: Free (as of 2026)
# Best for: general content policy violationsAzure Content Safety
Azure Content Safety provides category scores with severity levels (0, 2, 4, 6) across four categories. It's designed for enterprise use and supports custom blocklists.
Categories:
Hate— hate speechViolence— violent contentSexual— sexual contentSelfHarm— self-harm content
Each category returns a severity score from 0 (safe) to 6 (high severity).
# moderation/azure_content_safety.py
from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.credentials import AzureKeyCredential
from dataclasses import dataclass
@dataclass
class AzureSafetyResult:
is_safe: bool
hate_severity: int
violence_severity: int
sexual_severity: int
self_harm_severity: int
class AzureContentSafetyModerator:
def __init__(self, endpoint: str, api_key: str):
self.client = ContentSafetyClient(
endpoint=endpoint,
credential=AzureKeyCredential(api_key),
)
def analyze(
self,
text: str,
max_severity: int = 2, # Block severity 4+ (medium and above)
) -> AzureSafetyResult:
"""Analyze text synchronously. Use thread pool for async contexts."""
response = self.client.analyze_text(
AnalyzeTextOptions(
text=text,
categories=[
TextCategory.HATE,
TextCategory.VIOLENCE,
TextCategory.SEXUAL,
TextCategory.SELF_HARM,
],
)
)
severities = {
item.category: item.severity
for item in response.categories_analysis
}
hate = severities.get(TextCategory.HATE, 0)
violence = severities.get(TextCategory.VIOLENCE, 0)
sexual = severities.get(TextCategory.SEXUAL, 0)
self_harm = severities.get(TextCategory.SELF_HARM, 0)
is_safe = max(hate, violence, sexual, self_harm) <= max_severity
return AzureSafetyResult(
is_safe=is_safe,
hate_severity=hate,
violence_severity=violence,
sexual_severity=sexual,
self_harm_severity=self_harm,
)
def add_blocklist_item(self, blocklist_name: str, text: str):
"""Add custom terms to a blocklist (e.g., competitor drug names)."""
from azure.ai.contentsafety.models import AddOrUpdateTextBlocklistItemsOptions, TextBlocklistItem
self.client.add_or_update_blocklist_items(
blocklist_name=blocklist_name,
options=AddOrUpdateTextBlocklistItemsOptions(
blocklist_items=[TextBlocklistItem(text=text)]
),
)Custom blocklists are a key Azure Content Safety feature. For a pharmaceutical app, add:
- Drug names that should never be mentioned (controlled substances without medical context)
- Competitor brand names (if restricted by brand guidelines)
- Internal company terms that shouldn't appear in public responses
AWS Comprehend
AWS Comprehend is primarily an NLP service (sentiment, entities, key phrases) but also provides content classification and PII detection:
# moderation/aws_comprehend.py
import boto3
from dataclasses import dataclass
@dataclass
class ComprehendPIIResult:
contains_pii: bool
pii_entities: list[dict]
class AWSComprehendModerator:
def __init__(self, region: str = "us-east-1"):
self.client = boto3.client("comprehend", region_name=region)
def detect_pii(self, text: str, language: str = "en") -> ComprehendPIIResult:
"""Detect PII entities in text."""
response = self.client.detect_pii_entities(
Text=text,
LanguageCode=language,
)
entities = response.get("Entities", [])
pii_found = [
e for e in entities
if e.get("Score", 0) >= 0.9 # High confidence only
]
return ComprehendPIIResult(
contains_pii=len(pii_found) > 0,
pii_entities=[
{
"type": e["Type"],
"start": e["BeginOffset"],
"end": e["EndOffset"],
"score": e["Score"],
}
for e in pii_found
],
)
def mask_pii(self, text: str) -> str:
"""Replace detected PII with [REDACTED]."""
result = self.detect_pii(text)
if not result.contains_pii:
return text
# Replace PII from end to start (to preserve offsets)
masked = text
for entity in sorted(result.pii_entities, key=lambda e: e["end"], reverse=True):
masked = masked[:entity["start"]] + f"[{entity['type']}]" + masked[entity["end"]:]
return maskedComparison: Which API to Use
| API | Cost | Latency | Best For | |---|---|---|---| | OpenAI Moderation | Free | 100-200ms | General content policy (hate, violence, sexual) | | Azure Content Safety | ~$1/1k calls | 100-300ms | Enterprise, custom blocklists, severity levels | | AWS Comprehend | ~$0.50-$1/1k | 200-500ms | PII detection, NLP features, AWS ecosystem |
Recommendation for most AI applications:
- Use OpenAI Moderation API for user input screening (free, fast)
- Use Azure Content Safety for output screening with custom blocklists (enterprise control)
- Use AWS Comprehend only if you need PII masking specifically
Combining APIs in Production
# moderation/combined.py
import asyncio
async def full_moderation_pipeline(
text: str,
openai_client,
azure_moderator: AzureContentSafetyModerator,
) -> tuple[bool, list[str]]:
"""Run OpenAI + Azure in parallel, fail-fast on first block."""
# Run both in parallel
openai_result, azure_result = await asyncio.gather(
moderate_with_openai(text, openai_client),
asyncio.to_thread(azure_moderator.analyze, text),
)
reasons = []
if openai_result.flagged:
reasons.extend(openai_result.categories)
if not azure_result.is_safe:
if azure_result.hate_severity > 2:
reasons.append("hate")
if azure_result.violence_severity > 2:
reasons.append("violence")
is_safe = len(reasons) == 0
return is_safe, reasonsRunning both APIs in parallel adds virtually no latency compared to running either one — the total time is max(openai_latency, azure_latency), not the sum.
When Moderation APIs Are Not Enough
Moderation APIs catch policy violations but not domain-specific safety issues:
- OpenAI Moderation won't catch dangerous drug combination advice (not a policy violation per se)
- Azure Content Safety won't catch medically incorrect dosage information
For domain-specific safety (medical, legal, financial), you need custom classifiers or LLM-as-judge evaluation as described in the output classifiers lesson.
Use moderation APIs as Layer 1 (fast, general), then add domain-specific checks as Layer 2.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.