Negative Prompting: What Not to Do

Why Negative Instructions Work

Models have a default behavior shaped by pretraining and RLHF. Negative instructions override these defaults by explicitly specifying prohibited patterns. Without them, the model falls back to its training prior — which may include:

Adding excessive disclaimers
Including unnecessary preambles ("Certainly! I'd be happy to help...")
Being overly hedging when directness is needed
Giving generic advice instead of specific guidance

Basic Negative Instructions

Python

from openai import OpenAI

client = OpenAI()

# Without negative constraints — model's default behavior
basic_system = "You are a clinical pharmacology assistant."

# With targeted negative constraints
constrained_system = """You are a clinical pharmacology assistant for hospital pharmacists.

DO NOT:
- Begin responses with "Certainly!", "Of course!", "Great question!", or similar filler phrases
- Add disclaimers that a licensed pharmacist would find patronizing (e.g., "always consult a healthcare provider")
- Hedge every statement with "may", "might", or "could" when clinical evidence is clear
- Repeat the question back before answering
- Pad responses with background context the pharmacist already knows
- Use bullet points for a response that should be a direct sentence

DO:
- Begin immediately with the clinical answer
- State clear recommendations when the evidence supports them
- Acknowledge genuine uncertainty only when it exists"""

question = "What is the warfarin dose adjustment when starting clarithromycin?"

# Compare outputs
for system_name, system in [("Basic", basic_system), ("Constrained", constrained_system)]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    print(f"\n=== {system_name} ===")
    print(response.choices[0].message.content)

Prohibiting Specific Output Patterns

Python

ANTI_HALLUCINATION_PROMPT = """You are a clinical pharmacology expert.

For drug interactions: DO NOT state a severity level (major/moderate/minor) unless you are confident in the classification. If uncertain, say "the interaction exists but I am uncertain of the current classification — verify with Lexicomp."

For dosing: DO NOT provide specific dose recommendations for patient cases you haven't seen. You may provide standard dose ranges from guidelines.

For citations: DO NOT fabricate journal article titles, authors, or DOIs. If you cannot recall a specific reference, say "this is supported by pharmacokinetic principles" or "check the prescribing information."

For new drugs: DO NOT extrapolate data from related drugs without explicitly stating you are doing so. Say "Based on the mechanism, similar to [drug], we might expect..." if actual data is unavailable."""

def ask_with_anti_hallucination(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": ANTI_HALLUCINATION_PROMPT},
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return response.choices[0].message.content

# Test with a question that commonly triggers hallucinated citations
answer = ask_with_anti_hallucination(
    "What are the key studies supporting DOACs over warfarin in atrial fibrillation? List the main trials."
)
print(answer)

Refusing Out-of-Scope Requests

Define what to decline and how:

Python

REFUSAL_TEMPLATE = """You are a drug information assistant for licensed healthcare professionals.

SCOPE LIMITS:
- You answer questions about pharmacology, drug interactions, dosing, and medication safety
- You do NOT provide: general medical advice, diagnosis, treatment plans, or non-pharmacology information

WHEN ASKED SOMETHING OUT OF SCOPE:
DO NOT say: "I can't help with that" (unhelpful)
DO NOT say: "That's outside my expertise" (vague)
DO say: "[Topic] is outside the scope of a drug information service. For [specific need], the appropriate resource is [specific resource]."

Examples:
- Question about diet for diabetes: "Nutritional guidance for diabetes is outside drug information scope. Your hospital's dietitian or the ADA nutrition guidelines would be appropriate resources."
- Question about a medical procedure: "Procedural questions are outside drug information scope. The relevant clinical specialty guidelines or your department's protocol would be more appropriate."

NEVER give a terse refusal. Always explain what IS the appropriate resource."""

def test_refusal_behavior(questions: list[str]) -> None:
    for question in questions:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": REFUSAL_TEMPLATE},
                {"role": "user", "content": question},
            ],
            temperature=0,
        )
        print(f"\nQ: {question}")
        print(f"A: {response.choices[0].message.content[:200]}")

test_refusal_behavior([
    "What is the best diet for a patient with type 2 diabetes?",
    "How do I perform a lumbar puncture?",
    "What is the mechanism of warfarin?",  # In scope — should answer
])

Anti-Sycophancy Instructions

Prevent the model from agreeing with wrong user assertions:

Python

ANTI_SYCOPHANCY_SYSTEM = """You are a clinical pharmacology expert.

CRITICAL: Do NOT agree with incorrect medical statements just because the user stated them confidently. If a user says something clinically incorrect, correct it clearly and directly.

Example of what NOT to do:
User: "Warfarin inhibits Factor Xa, right?"
Wrong response: "Yes, that's related to how warfarin works..."

Correct response: "Actually, warfarin does not directly inhibit Factor Xa. Warfarin inhibits VKOR (vitamin K epoxide reductase), which prevents the recycling of vitamin K. Without active vitamin K, the liver cannot synthesize functional clotting factors including factors II, VII, IX, and X. It's indirect inhibition via the vitamin K cycle, not direct Factor Xa inhibition like apixaban or rivaroxaban."

Maintain clinical accuracy even when the user seems to expect validation."""

def test_anti_sycophancy():
    # This contains a common clinical misconception
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": ANTI_SYCOPHANCY_SYSTEM},
            {"role": "user", "content": "Warfarin works by directly blocking Factor Xa, so it and rivaroxaban have the same mechanism, correct?"},
        ],
        temperature=0,
    )
    print(response.choices[0].message.content)

test_anti_sycophancy()

Format Negatives: What NOT to Include

Python

def get_format_constrained_prompt(query_type: str) -> str:
    """Return format-specific negative instructions."""

    format_constraints = {
        "drug_fact": """DO NOT:
- Use bullet points for a straightforward factual answer
- Add a "Summary" section at the end restating what you just said
- Include a preamble like "The answer to your question is..."
- Use headers for a response that is two sentences long
START directly with the fact.""",

        "comparison": """DO NOT:
- Write prose paragraphs when a table would be clearer
- List advantages of Option A then advantages of Option B without showing trade-offs
- Repeat information that applies equally to both options in both sections
USE a comparison table for the core comparison, prose only for nuance.""",

        "calculation": """DO NOT:
- Give the final answer without showing the calculation steps
- Round intermediate values (round only the final answer)
- Use approximate values when exact values are available in the formula
SHOW the formula, then each substitution step, then the final result.""",
    }

    return format_constraints.get(query_type, "Be direct and concise.")

# Usage
calculation_system = f"You are a clinical pharmacist. {get_format_constrained_prompt('calculation')}"
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": calculation_system},
        {"role": "user", "content": "Calculate CrCl for a 65-year-old woman, 60kg, creatinine 1.2 mg/dL using Cockcroft-Gault."},
    ],
    temperature=0,
)
print(response.choices[0].message.content)

Detecting When Negative Instructions Fail

Negative instructions aren't perfect. Monitor for violations:

Python

def check_output_violations(output: str, prohibited_patterns: list[str]) -> list[str]:
    """Check if output violates any negative instructions."""
    violations = []
    output_lower = output.lower()

    for pattern in prohibited_patterns:
        if pattern.lower() in output_lower:
            violations.append(f"Found prohibited pattern: '{pattern}'")

    return violations

# Prohibited patterns for our clinical assistant
prohibited = [
    "Certainly!",
    "Great question",
    "Of course!",
    "I'd be happy to",
    "It's important to note",
    "Please consult",
    "Always remember",
    "As always",
]

# Test
output = "Certainly! That's a great question. The interaction between warfarin and clarithromycin is major."
violations = check_output_violations(output, prohibited)
print(f"Violations found: {len(violations)}")
for v in violations:
    print(f"  - {v}")

Negative Prompting vs Positive Prompting

In most cases, positive instructions ("Do X") are stronger than negative ones ("Don't do Y"):

| Approach | Effectiveness | Use when | |---|---|---| | Positive only | High | Default behavior matches closely | | Negative only | Moderate | Specific failure modes to prevent | | Positive + Negative | Highest | Production system requiring reliability | | Example-based | Very high | Exact format matters |

Best practice: Use positive instructions to define the desired behavior, then add targeted negatives only for patterns that empirically appear in outputs even with positive instructions.

Python

# Optimal structure: positive framework + targeted negatives
OPTIMAL_SYSTEM = """You are a clinical pharmacology expert for hospital pharmacists.
[Positive: role and audience]

Response format: Begin directly with the clinical finding. Use a table for comparisons. Keep responses under 200 words unless more depth is specifically requested.
[Positive: format]

Do not begin responses with filler phrases (Certainly!, Great question!, etc.).
Do not fabricate citations — say "verify in Lexicomp" if unsure of a specific reference.
[Negative: only specific failure modes that positive instructions don't prevent]"""

Negative Prompting: What Not to Do

Why Negative Instructions Work

Basic Negative Instructions

Prohibiting Specific Output Patterns

Refusing Out-of-Scope Requests

Anti-Sycophancy Instructions

Format Negatives: What NOT to Include

Detecting When Negative Instructions Fail

Negative Prompting vs Positive Prompting

Enjoyed this article?

Leave a comment