Context Injection

What Is Context Injection?

Context injection is adding dynamic, runtime information to a prompt — information the model wasn't trained on or that changes per request:

Types of injected context:
  Retrieved documents (RAG)
  User profile or preferences
  Conversation history
  Tool/API call results
  Current date/time
  Session state
  Database query results
  File contents

The model's frozen weights contain general knowledge; injected context provides task-specific, real-time information.

Structure of a Context-Injected Prompt

[System Prompt - static]
  Role, task, rules, output format

[Injected Context - dynamic]
  Retrieved documents, user state, tool outputs

[User Message - dynamic]
  The actual user query

The injected context sits between the fixed system instructions and the user's question — it's the "working memory" for the model.

Formatting Injected Context

Clear delimiters prevent the model from confusing instructions with data:

Python

def build_rag_prompt(context_docs: list[str], user_query: str) -> str:
    docs_formatted = "\n\n".join(
        f"<document index='{i+1}'>\n{doc}\n</document>"
        for i, doc in enumerate(context_docs)
    )

    return f"""<context>
{docs_formatted}
</context>

Based only on the documents above, answer the following question.
If the answer is not in the documents, say "I don't have that information."

Question: {user_query}"""

XML-style tags are recommended by both Anthropic and OpenAI for context delimitation — they resist injection attempts and are clearly parseable.

Context Ordering

The position of injected context affects how much the model attends to it:

Context at the START of the prompt (before user query):
  Higher attention in most models
  Model processes it as background before seeing the question
  Recommended for RAG context

Context at the END (after user query):
  "Lost in the middle" problem: middle context is least attended to
  Avoid putting critical information in the middle of very long contexts

Multiple documents: put the most relevant first and last
  (exploits primacy + recency effects)

Managing Context Length

The context window is limited. Priority order for what to include:

Priority 1: System prompt (safety, format, role)
Priority 2: Direct answer to the query (the most relevant retrieved chunk)
Priority 3: Supporting context (related documents)
Priority 4: Conversation history (most recent turns)
Priority 5: Background context (user profile, preferences)
Trim last when approaching limit.

Token counting:
```python
import tiktoken

def count_tokens(text: str, model: str = "gpt-4") -> int:
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

def fit_context_to_window(
    system_prompt: str,
    context_docs: list[str],
    user_query: str,
    max_tokens: int = 4096,
    reserved_for_output: int = 512
) -> list[str]:
    budget = max_tokens - reserved_for_output
    budget -= count_tokens(system_prompt + user_query)
    
    included = []
    for doc in context_docs:
        doc_tokens = count_tokens(doc)
        if budget - doc_tokens >= 0:
            included.append(doc)
            budget -= doc_tokens
        else:
            break
    return included

Tool Output Injection

When LLMs call tools (APIs, databases), results are injected back:

Python

def run_agent_step(user_query: str, tool_results: dict) -> str:
    tool_context = "\n".join(
        f"<tool name='{name}'>\n{result}\n</tool>"
        for name, result in tool_results.items()
    )

    messages = [
        {
            "role": "user",
            "content": f"""Tool results:
{tool_context}

Based on these results, answer: {user_query}"""
        }
    ]
    return call_llm(messages)

Grounding Instructions

When injecting context for factual queries, always tell the model to cite and stay grounded:

"Use only the information in the documents above to answer.
 Do not use your general knowledge.
 If the answer cannot be found in the provided documents, say exactly:
 'The provided context does not contain this information.'
 When answering, cite the relevant document: 'According to [Document 2]...'"

Without grounding instructions, the model will blend retrieved context with its own knowledge — the primary cause of hallucination in RAG systems.

Interview Answer

"Context injection adds dynamic, request-specific information to prompts — RAG retrieved documents, user state, tool outputs, current time. Structure: static system prompt → injected context → user query. Use XML tags to delimit context from instructions. Context at the start of the prompt receives more attention; information buried in the middle of long contexts is less reliably used (the 'lost in the middle' problem). Always include grounding instructions: 'use only the provided documents' and 'say if information isn't available' — this is the primary defence against hallucination in RAG pipelines. Manage token budget explicitly: trim lower-priority context (background info, old history) to fit within the window."