Prompt Engineering Mastery · Lesson 12 of 24
Injecting Retrieved Context into Prompts
What Is Context Injection?
Context injection is adding dynamic, runtime information to a prompt — information the model wasn't trained on or that changes per request:
Types of injected context:
Retrieved documents (RAG)
User profile or preferences
Conversation history
Tool/API call results
Current date/time
Session state
Database query results
File contentsThe model's frozen weights contain general knowledge; injected context provides task-specific, real-time information.
Structure of a Context-Injected Prompt
[System Prompt - static]
Role, task, rules, output format
[Injected Context - dynamic]
Retrieved documents, user state, tool outputs
[User Message - dynamic]
The actual user queryThe injected context sits between the fixed system instructions and the user's question — it's the "working memory" for the model.
Formatting Injected Context
Clear delimiters prevent the model from confusing instructions with data:
def build_rag_prompt(context_docs: list[str], user_query: str) -> str:
docs_formatted = "\n\n".join(
f"<document index='{i+1}'>\n{doc}\n</document>"
for i, doc in enumerate(context_docs)
)
return f"""<context>
{docs_formatted}
</context>
Based only on the documents above, answer the following question.
If the answer is not in the documents, say "I don't have that information."
Question: {user_query}"""XML-style tags are recommended by both Anthropic and OpenAI for context delimitation — they resist injection attempts and are clearly parseable.
Context Ordering
The position of injected context affects how much the model attends to it:
Context at the START of the prompt (before user query):
Higher attention in most models
Model processes it as background before seeing the question
Recommended for RAG context
Context at the END (after user query):
"Lost in the middle" problem: middle context is least attended to
Avoid putting critical information in the middle of very long contexts
Multiple documents: put the most relevant first and last
(exploits primacy + recency effects)Managing Context Length
The context window is limited. Priority order for what to include:
Priority 1: System prompt (safety, format, role)
Priority 2: Direct answer to the query (the most relevant retrieved chunk)
Priority 3: Supporting context (related documents)
Priority 4: Conversation history (most recent turns)
Priority 5: Background context (user profile, preferences)
Trim last when approaching limit.
Token counting:
```python
import tiktoken
def count_tokens(text: str, model: str = "gpt-4") -> int:
enc = tiktoken.encoding_for_model(model)
return len(enc.encode(text))
def fit_context_to_window(
system_prompt: str,
context_docs: list[str],
user_query: str,
max_tokens: int = 4096,
reserved_for_output: int = 512
) -> list[str]:
budget = max_tokens - reserved_for_output
budget -= count_tokens(system_prompt + user_query)
included = []
for doc in context_docs:
doc_tokens = count_tokens(doc)
if budget - doc_tokens >= 0:
included.append(doc)
budget -= doc_tokens
else:
break
return includedTool Output Injection
When LLMs call tools (APIs, databases), results are injected back:
def run_agent_step(user_query: str, tool_results: dict) -> str:
tool_context = "\n".join(
f"<tool name='{name}'>\n{result}\n</tool>"
for name, result in tool_results.items()
)
messages = [
{
"role": "user",
"content": f"""Tool results:
{tool_context}
Based on these results, answer: {user_query}"""
}
]
return call_llm(messages)Grounding Instructions
When injecting context for factual queries, always tell the model to cite and stay grounded:
"Use only the information in the documents above to answer.
Do not use your general knowledge.
If the answer cannot be found in the provided documents, say exactly:
'The provided context does not contain this information.'
When answering, cite the relevant document: 'According to [Document 2]...'"Without grounding instructions, the model will blend retrieved context with its own knowledge — the primary cause of hallucination in RAG systems.
Interview Answer
"Context injection adds dynamic, request-specific information to prompts — RAG retrieved documents, user state, tool outputs, current time. Structure: static system prompt → injected context → user query. Use XML tags to delimit context from instructions. Context at the start of the prompt receives more attention; information buried in the middle of long contexts is less reliably used (the 'lost in the middle' problem). Always include grounding instructions: 'use only the provided documents' and 'say if information isn't available' — this is the primary defence against hallucination in RAG pipelines. Manage token budget explicitly: trim lower-priority context (background info, old history) to fit within the window."