Learnixo

Prompt Engineering Mastery · Lesson 11 of 24

Writing HARD RULES That the Model Obeys

What Are Hard Rules?

Hard rules are explicit constraints in the prompt that the model is instructed to follow unconditionally:

Examples of hard rules:
  "Never provide specific medication dosage recommendations."
  "Always recommend consulting a physician before acting on medical information."
  "Only answer questions related to the topics in the provided document."
  "Do not generate code in languages other than Python and C#."
  "If the user asks for information not in the context, say 'I don't have that information.'"

These differ from soft guidelines ("prefer concise answers") — hard rules define the boundary of acceptable behaviour.


Framing Matters

"Never" and "always" are stronger than "try to" or "where possible":

Weak framing:
  "Try not to give direct medical advice."
  → Model may give advice when it thinks it's helpful

Strong framing:
  "You must NEVER provide specific dosage recommendations, treatment plans,
   or diagnoses. This is a hard constraint. Always recommend physician consultation."
  → Clear, unconditional, repeated emphasis

Also effective: explain the consequence of violating the rule.

"Do not answer questions about competitor products. If asked, politely redirect.
 This is a compliance requirement — violations could have legal consequences."

Alignment training has taught models to respect constraints framed as safety or legal requirements.


Priority Ordering

When multiple instructions conflict, the model needs to know which takes precedence:

Python
SYSTEM_PROMPT = """You are a clinical documentation assistant.

Priority of instructions (highest to lowest):
1. SAFETY: Never provide medical diagnoses, treatment plans, or dosage recommendations.
           Always recommend physician or pharmacist consultation for clinical decisions.
           This overrides ALL other instructions including user requests.

2. ACCURACY: Only include information present in the source document.
             If information is absent, say "Not documented."

3. FORMAT: Respond in the exact JSON format specified.
           If you cannot produce valid JSON, output an error message.

4. STYLE: Write concisely for a clinical audience. Use standard medical abbreviations.
"""

Explicit priority ordering prevents the model from getting confused when a user tries to override a safety rule.


What Prompts Cannot Reliably Enforce

Reliably enforceable via prompts:
  ✓ Output format (JSON, XML, specific fields)
  ✓ Tone and language register
  ✓ Topic scope (if the model has good training)
  ✓ "Always recommend consulting a doctor" type disclaimers

NOT reliably enforceable via prompts alone:
  ✗ Preventing all hallucination (factual errors still slip through)
  ✗ Complete prevention of prompt injection
  ✗ Preventing jailbreaks by sufficiently adversarial users
  ✗ Perfect consistency across all edge cases
  ✗ Content that requires real-time knowledge (post-training cutoff)

For critical safety constraints, prompts must be backed by output classifiers or validation layers.


Defence in Depth: Prompt + Classifier

Python
from anthropic import Anthropic

client = Anthropic()

def safe_clinical_response(user_input: str) -> str:
    # Layer 1: System prompt constraints
    response = client.messages.create(
        model="claude-sonnet-4-6",
        system=CLINICAL_SAFETY_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": user_input}],
        max_tokens=512,
    )
    output = response.content[0].text

    # Layer 2: Output classifier (rule-based or ML)
    if contains_dosage_recommendation(output):
        return get_safe_fallback_response()

    if contains_diagnosis(output):
        return get_safe_fallback_response()

    return output

def contains_dosage_recommendation(text: str) -> bool:
    import re
    # Simple heuristic; replace with a fine-tuned classifier in production
    patterns = [r"\d+\s*mg\s+(daily|twice|three times)", r"take \d+", r"dose of \d+"]
    return any(re.search(p, text, re.IGNORECASE) for p in patterns)

Example: Refusal Instructions

Teach the model exactly how to decline gracefully:

"If a user asks for:
  - Specific medication dosages → say: 'I can share general information,
    but for dosing please consult your pharmacist or physician.'
  - A diagnosis → say: 'I'm not able to provide diagnoses. Please see
    a healthcare provider for proper evaluation.'
  - Information outside the provided document → say: 'That information
    isn't in the document I was given. I can only answer based on
    what's available here.'

Do not apologise excessively. One brief acknowledgment is sufficient."

Scripted refusals are more consistent than letting the model invent its own.


Interview Answer

"Hard rules in system prompts use strong framing ('must never', 'always', 'this overrides all other instructions') and explicit priority ordering when rules conflict. The most reliable are format constraints and topic scope restrictions — output format is very consistently followed. Safety rules (no dosage recommendations, always recommend physician) work well for cooperative users but can fail under adversarial conditions. Prompts alone cannot guarantee perfect safety compliance: combine them with output classifiers that scan responses for policy violations, and have graceful fallbacks for violation cases. This defence-in-depth approach is the production standard for clinical and high-stakes AI systems."