Tutorials AI & LLM Engineering for .NET Architects
Content Moderation: Azure AI Content Safety integration
On this page
AI Content Safety
When you open your app to the world, you are responsible for what the AI says. Content Moderation ensures the model doesn't generate hate speech, violence, or sexual content.
1. Pre-filtering vs Post-filtering
A professional safety system has two layers:
- Input Filtering: Checking the USER's prompt before it even hits the LLM. If they ask "How do I make a bomb?", the request is blocked immediately.
- Output Filtering: Checking the AI's response before the user sees it. If the AI goes off the rails, the system replaces the bad text with "I'm sorry, I cannot answer that."
2. Azure AI Content Safety
This is a specialized model that gives you a **Severity Score** (0-6) for Hate, Self-Harm, Sexual, and Violence. It is much more accurate than simple keyword blocking and can even detect "Jailbreak" attempts hidden in code.
4. Interview Mastery
Q: "How do you handle 'False Positives' in content moderation?"
Architect Answer: "Content safety is a balance between safety and utility. We use **Human-in-the-loop** for borderline cases. If a message is flagged as 'Level 2' (low risk), we might log it for review but still show it. If it's 'Level 5' (high risk), we block it. We also maintain an **Exception List** for internal users or specific technical domains (like medical or legal) where sensitive words might be legitimate."