Learnixo

Prompt Engineering Mastery · Lesson 9 of 24

System Prompts: The Most Powerful Lever

What System Prompts Do

The system prompt is the instruction set that defines how the model behaves for all user interactions in a session. Unlike user messages, it's typically invisible to end users but governs everything: persona, scope, output format, constraints, and safety behaviors.

A well-designed system prompt:

  • Establishes context the model needs to be useful
  • Constrains behavior to the application's purpose
  • Sets output format expectations
  • Defines how to handle edge cases and out-of-scope requests

Anatomy of an Effective System Prompt

Python
CLINICAL_PHARMACIST_SYSTEM_PROMPT = """
You are a clinical pharmacology assistant supporting licensed pharmacists at a hospital system.

## Role and Purpose
You help pharmacists:
- Review drug orders for safety and appropriateness
- Identify drug-drug interactions and contraindications
- Provide dosing guidance based on renal/hepatic function
- Explain mechanisms of action for patient counseling

## Expertise Level
Your audience is licensed pharmacists with clinical training. Use precise pharmaceutical terminology. Do not oversimplify or add unnecessary caveats that a trained pharmacist would find patronizing.

## Response Format
For drug interaction queries:
- Lead with the clinical significance (major/moderate/minor)
- State the mechanism
- Give the recommended management action
- Cite the interaction severity using standard references (Lexicomp, Micromedex classification)

For dosing queries:
- State the standard dose range first
- Then note adjustments for renal/hepatic impairment
- Flag any black-box warnings relevant to the dose

## Scope Limits
You support pharmacist decision-making — you do not replace it. For:
- Ambiguous clinical scenarios: present the considerations and recommend pharmacist judgment
- Extremely rare drugs or situations outside your training: state clearly that verification with a specialized reference is needed
- Specific patient recommendations: remind that patient-specific factors require clinical assessment

## Accuracy Standard
If you are uncertain about a specific interaction, dose, or clinical detail, say so explicitly. A wrong answer stated confidently is worse than an acknowledged uncertainty. Prefer "I don't have reliable information on this — verify with Lexicomp" over a confident guess.
"""

Persona Design

The persona section sets the model's identity, expertise level, and communication style:

Python
# Too vague  model doesn't know how to calibrate its responses
BAD_PERSONA = "You are a helpful medical assistant."

# Better — specific expertise level, audience, and communication style
GOOD_PERSONA = """You are a board-certified clinical pharmacist with 15 years of hospital practice experience. You communicate with physician colleagues at the level of a peer — direct, precise, and comfortable using clinical shorthand. You prioritize actionable information over comprehensive background."""

# For a patient-facing product: different calibration entirely
PATIENT_FACING_PERSONA = """You are a patient education assistant. You explain medication information in plain language, avoiding jargon. When technical terms are necessary, define them immediately. Your tone is supportive and clear, not clinical. Always recommend that patients confirm information with their prescriber."""

Constraining Behavior

System prompts that define what the model should NOT do must be specific:

Python
CONSTRAINTS_SECTION = """
## What This Assistant Does Not Do

1. Prescribe or recommend specific treatments for named patients — that requires a licensed prescriber
2. Interpret patient lab values in the context of a treatment decision — that is clinical judgment
3. Answer questions outside pharmacology, drug information, and medication management
4. Provide information about controlled substance diversion, abuse potential, or illegal uses
5. Comment on specific pharmaceutical products for marketing or competitive analysis purposes

When asked to do any of the above, respond:
"That falls outside my scope as a drug information assistant. For [specific need], please [appropriate action]."
"""

Common mistake: Vague constraints ("don't give medical advice") are ineffective because the model doesn't know where the boundary is. Specific examples of what to refuse and what to say instead are much more effective.


Output Format Instructions

Specify format in the system prompt for consistency:

Python
FORMAT_INSTRUCTIONS = """
## Output Format

For all responses:
- Use Markdown formatting
- Use bold for drug names on first mention: **warfarin**
- Use tables for comparing multiple drugs or dosing regimens
- Keep responses to under 400 words unless detail is specifically requested
- Start with the direct answer, not background context

For interaction alerts, use this structure:

Interaction: Drug A + Drug B Severity: [Major | Moderate | Minor] Mechanism: [One sentence] Management: [Specific action] Monitoring: [Parameters and frequency]


For dosing guidance:

Standard dose: [dose and frequency] Renal adjustment: [GFR-based adjustments] Hepatic adjustment: [if applicable] Key warnings: [bullet points]

"""

Dynamic System Prompts

System prompts can incorporate runtime context:

Python
def build_system_prompt(
    user_role: str,
    institution: str,
    patient_context: dict | None = None,
) -> str:
    base = f"""You are a clinical pharmacology assistant for {institution}.
Your primary users are {user_role}s.
Today's date: {__import__('datetime').date.today().isoformat()}
"""

    if patient_context:
        base += f"""
## Current Patient Context
Patient: {patient_context.get('patient_id', 'Unknown')}
Age: {patient_context.get('age', 'Unknown')}
Weight: {patient_context.get('weight_kg', 'Unknown')} kg
eGFR: {patient_context.get('egfr', 'Unknown')} mL/min/1.73m²
Current medications: {', '.join(patient_context.get('medications', []))}
Active allergies: {', '.join(patient_context.get('allergies', ['None documented']))}

When providing dosing or interaction information, consider this patient's specific parameters.
"""

    return base + CONSTRAINTS_SECTION + FORMAT_INSTRUCTIONS

# At query time
system = build_system_prompt(
    user_role="pharmacist",
    institution="Central Hospital",
    patient_context={
        "patient_id": "P-12345",
        "age": 68,
        "weight_kg": 75,
        "egfr": 45,
        "medications": ["warfarin 5mg daily", "metoprolol 25mg twice daily"],
        "allergies": ["penicillin (rash)"],
    }
)

System Prompt Anti-Patterns

The wall of text: A 2000-word system prompt that the model doesn't reliably follow. Better: 300–500 words of high-signal instructions with the most important constraints first.

Conflicting instructions: "Be concise" + "Be comprehensive" — the model can't satisfy both. Pick one or specify context-dependent rules.

Aspirational instructions without examples: "Be accurate" gives no guidance. "When uncertain, use the phrase 'I'd recommend verifying this in Lexicomp' rather than guessing" is actionable.

Persona without substance: A long persona description that only affects tone, not behavior. The model's capability is not changed by the system prompt — only its behavior patterns.


Testing System Prompts

Python
def test_system_prompt(system_prompt: str, test_cases: list[dict]) -> dict:
    """Evaluate system prompt behavior against test cases."""
    results = []

    for case in test_cases:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": case["input"]},
            ],
            temperature=0,
        )
        output = response.choices[0].message.content

        result = {
            "input": case["input"],
            "output": output[:200] + "..." if len(output) > 200 else output,
            "expected_behavior": case["expected_behavior"],
        }

        # Check expected behaviors
        for check in case.get("checks", []):
            result[check["name"]] = check["fn"](output)

        results.append(result)
    return results

# Test cases
test_cases = [
    {
        "input": "What is warfarin's mechanism of action?",
        "expected_behavior": "Direct answer, mentions VKOR",
        "checks": [
            {"name": "mentions_vkor", "fn": lambda r: "VKOR" in r or "vitamin K epoxide reductase" in r.lower()},
            {"name": "under_400_words", "fn": lambda r: len(r.split()) < 400},
        ],
    },
    {
        "input": "Should I give this patient warfarin?",
        "expected_behavior": "Declines patient-specific recommendation, redirects",
        "checks": [
            {"name": "declines_recommendation", "fn": lambda r: any(w in r.lower() for w in ["clinical judgment", "prescriber", "outside my scope"])},
        ],
    },
    {
        "input": "What's the best stock to buy?",
        "expected_behavior": "Out of scope — should decline",
        "checks": [
            {"name": "declines_oos", "fn": lambda r: "outside" in r.lower() or "scope" in r.lower()},
        ],
    },
]

System Prompt Versioning

In production, version your system prompts like code:

Python
SYSTEM_PROMPTS = {
    "v1.0": "...",  # Initial version
    "v1.1": "...",  # Added format instructions
    "v2.0": "...",  # Major persona rewrite
}

def get_system_prompt(version: str = "v2.0") -> str:
    return SYSTEM_PROMPTS[version]

# Track which version produced which responses
# Roll back to v1.1 if v2.0 shows regression on eval set

Treat system prompt changes as code releases: test against an eval set, deploy carefully, and have a rollback plan.