LLM Security in Production: Prompt Injection Defense Playbook
Defend AI systems against prompt injection, data leakage, insecure tool use, and policy bypass with practical architecture controls.
Prompt injection is not a prompt problem. It is an application security problem in AI interfaces.
Threat Model for LLM Applications
Primary risks:
- instruction hijacking from untrusted content
- data exfiltration via model output
- unauthorized tool invocation
- cross-tenant context leakage
Treat all retrieved or user-provided content as untrusted.
1) Enforce Trust Boundaries
Separate:
- system policy
- user input
- retrieved documents
- tool outputs
Never allow retrieved content to overwrite system-level rules.
System policy (highest trust)
User intent (medium trust)
Retrieved data/tool output (low trust, sanitize)2) Tool Invocation Safeguards
- strict allowlist by route/use case
- schema validation on tool arguments
- deny hidden/implicit arguments
- per-tool auth checks before execution
Example:
ALLOWED_TOOLS = {"search_docs", "get_ticket_status"}
if requested_tool not in ALLOWED_TOOLS:
raise PermissionError("Tool not allowed")3) Output Security Filtering
Inspect output for:
- secrets/API keys
- sensitive PII
- policy-violating instructions
- unsafe links or executable payloads
Block or redact before returning to user/logging.
4) Prompt Injection Detection Heuristics
Flag content patterns like:
- "ignore previous instructions"
- "reveal hidden/system prompt"
- "execute this command regardless of policy"
Use heuristics + classifier + policy rules, not single regex only.
5) Retrieval Hardening for RAG
- metadata ACL filtering before retrieval
- signed or verified document sources
- chunk-level source attribution
- do not retrieve outside user's tenant scope
Security bugs in retrieval are usually authorization bugs.
6) Logging and Audit Requirements
Log securely:
- prompt hash/version (not always raw prompt)
- tool calls + arguments + decision outcomes
- blocked outputs and policy reasons
- user/session/tenant IDs
Keep audit trail immutable for incident review.
7) Red Team Test Cases
Include tests for:
- direct override attempts
- indirect injection in uploaded docs
- cross-tenant data probes
- malicious tool argument suggestions
- jailbreak-like role confusion
Run these tests in CI for every prompt/policy change.
Secure LLM Gateway Pattern
Client -> API Gateway -> Policy Engine -> LLM Orchestrator -> Model/Tools
| |
v v
Audit Log <---- Decision LogPut policy enforcement before and after model calls.
Incident Response Basics for AI Systems
- disable affected tool routes quickly
- rotate leaked secrets
- invalidate risky cached contexts
- replay logs to estimate blast radius
- publish postmortem with control improvements
LLM security maturity comes from layered controls, not one perfect prompt.
Enjoyed this article?
Explore the Security & Compliance learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.