RAG Retrieval — Finding and Injecting Context into the AI
Build RAG retrieval pipelines in .NET: query embedding, similarity search, context assembly, prompt injection of retrieved documents, re-ranking, hybrid retrieval, and hallucination prevention for clinical AI.
The RAG Retrieval Pipeline
RAG pipeline steps:
1. User asks: "What is the standard INR monitoring frequency for Warfarin?"
2. Embed the query → query vector
3. Search vector store → top-5 relevant document chunks
4. Filter: remove chunks with similarity below threshold (e.g., 0.70)
5. Re-rank: optionally reorder by cross-encoder for precision
6. Assemble context: format retrieved chunks as readable text
7. Build prompt: system message + retrieved context + user question
8. AI generates answer grounded in the retrieved context
9. Output: cite which documents were used (for auditability)
What RAG prevents:
→ Hallucination of clinical facts the AI was not trained on
→ Outdated answers from training data (your documents are always current)
→ "I don't know" for domain-specific questions (your documents are the source)
What RAG does NOT prevent:
→ Hallucination within the retrieved context (AI misreads a document)
→ Retrieval failure (right answer exists but wrong chunk retrieved)
→ Synthesis errors (AI correctly retrieves but incorrectly combines information)Query Embedding and Retrieval
// Complete retrieval service: embed → search → assemble
public sealed class ClinicalRagRetrievalService
{
private readonly ITextEmbeddingGenerationService _embeddings;
private readonly IVectorDocumentStore _vectorStore;
public async Task<RetrievalResult> RetrieveContextAsync(
string query,
RagRetrievalOptions options,
CancellationToken ct)
{
// Step 1: Embed the query
var queryEmbedding = await _embeddings.GenerateEmbeddingAsync(query, null, ct);
// Step 2: Search vector store
var chunks = await _vectorStore.SearchAsync(
queryEmbedding: queryEmbedding.ToArray(),
patientMrn: options.PatientMrn,
sourceType: options.SourceTypeFilter,
topK: options.TopK,
minScore: options.MinSimilarityScore,
ct: ct);
if (!chunks.Any())
{
return RetrievalResult.NoResults(query);
}
// Step 3: Assemble context
var context = AssembleContext(chunks);
return new RetrievalResult(
Query: query,
RetrievedChunks: chunks,
AssembledContext: context,
HasResults: true);
}
private static string AssembleContext(IReadOnlyList<ScoredChunk> chunks)
{
var sb = new StringBuilder();
sb.AppendLine("Retrieved clinical documents:");
sb.AppendLine();
for (int i = 0; i < chunks.Count; i++)
{
var chunk = chunks[i];
sb.AppendLine($"[Document {i + 1}] (Source: {chunk.SourceType}, Relevance: {chunk.SimilarityScore:F2})");
sb.AppendLine(chunk.ChunkText);
sb.AppendLine();
}
return sb.ToString();
}
}
public sealed record RagRetrievalOptions(
string? PatientMrn = null,
string? SourceTypeFilter = null,
int TopK = 5,
float MinSimilarityScore = 0.70f);
public sealed record RetrievalResult(
string Query,
IReadOnlyList<ScoredChunk> RetrievedChunks,
string AssembledContext,
bool HasResults)
{
public static RetrievalResult NoResults(string query) =>
new(query, Array.Empty<ScoredChunk>(), string.Empty, false);
}Prompt Assembly with Retrieved Context
// Build a grounded prompt by injecting retrieved documents into context
public sealed class RagClinicalCopilotService
{
private readonly ClinicalRagRetrievalService _retrieval;
private readonly IChatCompletionService _chat;
private readonly Kernel _kernel;
public async Task<RagResponse> AnswerAsync(
string query,
RagRetrievalOptions retrievalOptions,
CancellationToken ct)
{
// Retrieve relevant context
var retrieval = await _retrieval.RetrieveContextAsync(query, retrievalOptions, ct);
ChatHistory history;
if (!retrieval.HasResults)
{
// No context found — tell the AI there are no documents
history = new ChatHistory(NoContextSystemPrompt);
history.AddUserMessage(query);
}
else
{
// Inject retrieved context into system message
var systemPrompt = BuildGroundedSystemPrompt(retrieval.AssembledContext);
history = new ChatHistory(systemPrompt);
history.AddUserMessage(query);
}
var response = await _chat.GetChatMessageContentAsync(
history,
new OpenAIPromptExecutionSettings { Temperature = 0.2 },
_kernel,
ct);
return new RagResponse(
Answer: response.Content ?? string.Empty,
SourceDocuments: retrieval.RetrievedChunks,
WasGrounded: retrieval.HasResults);
}
private static string BuildGroundedSystemPrompt(string context) => $"""
You are a clinical pharmacist assistant with access to clinical documents.
IMPORTANT: Answer questions using ONLY the information provided in the documents below.
If the answer is not in the documents, say: "I don't have a document that covers this.
Please consult the BNF or a clinical pharmacist."
Do not use your training knowledge for clinical facts — only use the retrieved documents.
Always reference which document your answer comes from (e.g., "According to Document 2...").
{context}
""";
private const string NoContextSystemPrompt = """
You are a clinical pharmacist assistant.
No relevant documents were found for this question.
Respond: "I don't have documents covering this topic. Please consult the BNF or a senior pharmacist."
Do not attempt to answer using your general training knowledge.
""";
}
public sealed record RagResponse(
string Answer,
IReadOnlyList<ScoredChunk> SourceDocuments,
bool WasGrounded);Hybrid Retrieval: Keyword + Vector
// Hybrid retrieval combines vector search with keyword search
// Better for specific clinical terms (drug names, MRN numbers, exact codes)
public sealed class HybridClinicalRetrievalService
{
private readonly ITextEmbeddingGenerationService _embeddings;
private readonly IFullTextSearchService _keywordSearch;
private readonly IVectorDocumentStore _vectorStore;
public async Task<IReadOnlyList<ScoredChunk>> HybridSearchAsync(
string query,
string? patientMrn,
CancellationToken ct)
{
// Run keyword and vector search in parallel
var keywordTask = _keywordSearch.SearchAsync(query, patientMrn, topK: 10, ct);
var embedding = await _embeddings.GenerateEmbeddingAsync(query, null, ct);
var vectorTask = _vectorStore.SearchAsync(embedding.ToArray(), patientMrn, topK: 10, ct: ct);
await Task.WhenAll(keywordTask, vectorTask);
// Reciprocal Rank Fusion — combine keyword and vector rankings
return ReciprocRankFusion(keywordTask.Result, vectorTask.Result, k: 60)
.Take(5)
.ToList();
}
// RRF score: 1 / (rank + k) — higher score = better combined rank
private static IOrderedEnumerable<ScoredChunk> ReciprocRankFusion(
IReadOnlyList<ScoredChunk> keywordResults,
IReadOnlyList<ScoredChunk> vectorResults,
int k)
{
var scores = new Dictionary<Guid, float>();
for (int i = 0; i < keywordResults.Count; i++)
scores[keywordResults[i].Id] =
scores.GetValueOrDefault(keywordResults[i].Id) + 1f / (i + 1 + k);
for (int i = 0; i < vectorResults.Count; i++)
scores[vectorResults[i].Id] =
scores.GetValueOrDefault(vectorResults[i].Id) + 1f / (i + 1 + k);
var allChunks = keywordResults.Concat(vectorResults)
.GroupBy(c => c.Id)
.Select(g => g.First())
.ToDictionary(c => c.Id);
return allChunks.Values.OrderByDescending(c => scores.GetValueOrDefault(c.Id));
}
}Citing Sources in Responses
// For clinical systems, every RAG answer must cite its source documents
// This enables prescribers to verify the AI's reasoning
public sealed class CitationFormattingService
{
public string FormatResponseWithCitations(
string aiResponse,
IReadOnlyList<ScoredChunk> sources)
{
if (!sources.Any())
return aiResponse;
var citations = new StringBuilder("\n\n**Sources used:**\n");
for (int i = 0; i < sources.Count; i++)
{
citations.AppendLine(
$"[Document {i + 1}] {sources[i].SourceType} " +
$"(relevance: {sources[i].SimilarityScore:P0})");
}
return aiResponse + citations.ToString();
}
}
// API response model with source metadata:
public sealed record RagApiResponse(
string Answer,
bool IsGrounded,
IReadOnlyList<SourceCitation> Sources);
public sealed record SourceCitation(
string DocumentType,
float RelevanceScore,
string Preview); // first 200 characters of the retrieved chunkProduction issue I've seen: A RAG system for clinical guidelines was retrieving the right documents but the AI was producing answers that contradicted them. The system prompt said: "Answer questions about prescriptions using the documents provided and your medical knowledge." The phrase "and your medical knowledge" allowed the AI to blend retrieved content with training data — and when there was a conflict (e.g., the retrieved guideline was updated but the AI's training data was older), the AI would sometimes prefer its training data. Removing "and your medical knowledge" and replacing it with "ONLY use the provided documents — if the answer is not in the documents, say so" eliminated the contradictions. RAG system prompts must prohibit the AI from using general training knowledge for clinical facts.
Key Takeaway
RAG retrieval: embed the user query, search the vector store for similar document chunks, assemble a context string, inject it into the system prompt, and instruct the AI to answer only from the retrieved documents. For clinical RAG: always include a "not in documents — say so" instruction to prevent the AI from falling back to potentially outdated training data. Hybrid retrieval (keyword + vector with Reciprocal Rank Fusion) outperforms pure vector search for domain-specific clinical terms. Always include source citations in responses — prescribers need to verify AI-generated clinical information.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.