Back to blog
Security & Complianceadvanced

LLM Security in Production: Prompt Injection Defense Playbook

Defend AI systems against prompt injection, data leakage, insecure tool use, and policy bypass with practical architecture controls.

Asma HafeezMay 6, 20263 min read
LLM SecurityPrompt InjectionAI SafetyPolicy EnforcementTool SecurityData Leakage
Share:𝕏

Prompt injection is not a prompt problem. It is an application security problem in AI interfaces.


Threat Model for LLM Applications

Primary risks:

  • instruction hijacking from untrusted content
  • data exfiltration via model output
  • unauthorized tool invocation
  • cross-tenant context leakage

Treat all retrieved or user-provided content as untrusted.


1) Enforce Trust Boundaries

Separate:

  • system policy
  • user input
  • retrieved documents
  • tool outputs

Never allow retrieved content to overwrite system-level rules.

TEXT
System policy (highest trust)
User intent (medium trust)
Retrieved data/tool output (low trust, sanitize)

2) Tool Invocation Safeguards

  • strict allowlist by route/use case
  • schema validation on tool arguments
  • deny hidden/implicit arguments
  • per-tool auth checks before execution

Example:

Python
ALLOWED_TOOLS = {"search_docs", "get_ticket_status"}
if requested_tool not in ALLOWED_TOOLS:
    raise PermissionError("Tool not allowed")

3) Output Security Filtering

Inspect output for:

  • secrets/API keys
  • sensitive PII
  • policy-violating instructions
  • unsafe links or executable payloads

Block or redact before returning to user/logging.


4) Prompt Injection Detection Heuristics

Flag content patterns like:

  • "ignore previous instructions"
  • "reveal hidden/system prompt"
  • "execute this command regardless of policy"

Use heuristics + classifier + policy rules, not single regex only.


5) Retrieval Hardening for RAG

  • metadata ACL filtering before retrieval
  • signed or verified document sources
  • chunk-level source attribution
  • do not retrieve outside user's tenant scope

Security bugs in retrieval are usually authorization bugs.


6) Logging and Audit Requirements

Log securely:

  • prompt hash/version (not always raw prompt)
  • tool calls + arguments + decision outcomes
  • blocked outputs and policy reasons
  • user/session/tenant IDs

Keep audit trail immutable for incident review.


7) Red Team Test Cases

Include tests for:

  • direct override attempts
  • indirect injection in uploaded docs
  • cross-tenant data probes
  • malicious tool argument suggestions
  • jailbreak-like role confusion

Run these tests in CI for every prompt/policy change.


Secure LLM Gateway Pattern

TEXT
Client -> API Gateway -> Policy Engine -> LLM Orchestrator -> Model/Tools
                          |                 |
                          v                 v
                        Audit Log <---- Decision Log

Put policy enforcement before and after model calls.


Incident Response Basics for AI Systems

  • disable affected tool routes quickly
  • rotate leaked secrets
  • invalidate risky cached contexts
  • replay logs to estimate blast radius
  • publish postmortem with control improvements

LLM security maturity comes from layered controls, not one perfect prompt.

Enjoyed this article?

Explore the Security & Compliance learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.