RAG Retrieval — Finding and Injecting Context into the AI

The RAG Retrieval Pipeline

RAG pipeline steps:

1. User asks: "What is the standard INR monitoring frequency for Warfarin?"
2. Embed the query → query vector
3. Search vector store → top-5 relevant document chunks
4. Filter: remove chunks with similarity below threshold (e.g., 0.70)
5. Re-rank: optionally reorder by cross-encoder for precision
6. Assemble context: format retrieved chunks as readable text
7. Build prompt: system message + retrieved context + user question
8. AI generates answer grounded in the retrieved context
9. Output: cite which documents were used (for auditability)

What RAG prevents:
  → Hallucination of clinical facts the AI was not trained on
  → Outdated answers from training data (your documents are always current)
  → "I don't know" for domain-specific questions (your documents are the source)

What RAG does NOT prevent:
  → Hallucination within the retrieved context (AI misreads a document)
  → Retrieval failure (right answer exists but wrong chunk retrieved)
  → Synthesis errors (AI correctly retrieves but incorrectly combines information)

Query Embedding and Retrieval

// Complete retrieval service: embed → search → assemble

public sealed class ClinicalRagRetrievalService
{
    private readonly ITextEmbeddingGenerationService _embeddings;
    private readonly IVectorDocumentStore            _vectorStore;

    public async Task<RetrievalResult> RetrieveContextAsync(
        string query,
        RagRetrievalOptions options,
        CancellationToken ct)
    {
        // Step 1: Embed the query
        var queryEmbedding = await _embeddings.GenerateEmbeddingAsync(query, null, ct);

        // Step 2: Search vector store
        var chunks = await _vectorStore.SearchAsync(
            queryEmbedding: queryEmbedding.ToArray(),
            patientMrn:     options.PatientMrn,
            sourceType:     options.SourceTypeFilter,
            topK:           options.TopK,
            minScore:       options.MinSimilarityScore,
            ct:             ct);

        if (!chunks.Any())
        {
            return RetrievalResult.NoResults(query);
        }

        // Step 3: Assemble context
        var context = AssembleContext(chunks);

        return new RetrievalResult(
            Query:          query,
            RetrievedChunks: chunks,
            AssembledContext: context,
            HasResults:     true);
    }

    private static string AssembleContext(IReadOnlyList<ScoredChunk> chunks)
    {
        var sb = new StringBuilder();
        sb.AppendLine("Retrieved clinical documents:");
        sb.AppendLine();

        for (int i = 0; i < chunks.Count; i++)
        {
            var chunk = chunks[i];
            sb.AppendLine($"[Document {i + 1}] (Source: {chunk.SourceType}, Relevance: {chunk.SimilarityScore:F2})");
            sb.AppendLine(chunk.ChunkText);
            sb.AppendLine();
        }

        return sb.ToString();
    }
}

public sealed record RagRetrievalOptions(
    string? PatientMrn         = null,
    string? SourceTypeFilter   = null,
    int     TopK               = 5,
    float   MinSimilarityScore = 0.70f);

public sealed record RetrievalResult(
    string                   Query,
    IReadOnlyList<ScoredChunk> RetrievedChunks,
    string                   AssembledContext,
    bool                     HasResults)
{
    public static RetrievalResult NoResults(string query) =>
        new(query, Array.Empty<ScoredChunk>(), string.Empty, false);
}

Prompt Assembly with Retrieved Context

// Build a grounded prompt by injecting retrieved documents into context

public sealed class RagClinicalCopilotService
{
    private readonly ClinicalRagRetrievalService _retrieval;
    private readonly IChatCompletionService      _chat;
    private readonly Kernel                      _kernel;

    public async Task<RagResponse> AnswerAsync(
        string query,
        RagRetrievalOptions retrievalOptions,
        CancellationToken ct)
    {
        // Retrieve relevant context
        var retrieval = await _retrieval.RetrieveContextAsync(query, retrievalOptions, ct);

        ChatHistory history;

        if (!retrieval.HasResults)
        {
            // No context found — tell the AI there are no documents
            history = new ChatHistory(NoContextSystemPrompt);
            history.AddUserMessage(query);
        }
        else
        {
            // Inject retrieved context into system message
            var systemPrompt = BuildGroundedSystemPrompt(retrieval.AssembledContext);
            history = new ChatHistory(systemPrompt);
            history.AddUserMessage(query);
        }

        var response = await _chat.GetChatMessageContentAsync(
            history,
            new OpenAIPromptExecutionSettings { Temperature = 0.2 },
            _kernel,
            ct);

        return new RagResponse(
            Answer:          response.Content ?? string.Empty,
            SourceDocuments: retrieval.RetrievedChunks,
            WasGrounded:     retrieval.HasResults);
    }

    private static string BuildGroundedSystemPrompt(string context) => $"""
        You are a clinical pharmacist assistant with access to clinical documents.

        IMPORTANT: Answer questions using ONLY the information provided in the documents below.
        If the answer is not in the documents, say: "I don't have a document that covers this.
        Please consult the BNF or a clinical pharmacist."
        Do not use your training knowledge for clinical facts — only use the retrieved documents.
        Always reference which document your answer comes from (e.g., "According to Document 2...").

        {context}
        """;

    private const string NoContextSystemPrompt = """
        You are a clinical pharmacist assistant.
        No relevant documents were found for this question.
        Respond: "I don't have documents covering this topic. Please consult the BNF or a senior pharmacist."
        Do not attempt to answer using your general training knowledge.
        """;
}

public sealed record RagResponse(
    string                   Answer,
    IReadOnlyList<ScoredChunk> SourceDocuments,
    bool                     WasGrounded);

Hybrid Retrieval: Keyword + Vector

// Hybrid retrieval combines vector search with keyword search
// Better for specific clinical terms (drug names, MRN numbers, exact codes)

public sealed class HybridClinicalRetrievalService
{
    private readonly ITextEmbeddingGenerationService _embeddings;
    private readonly IFullTextSearchService          _keywordSearch;
    private readonly IVectorDocumentStore            _vectorStore;

    public async Task<IReadOnlyList<ScoredChunk>> HybridSearchAsync(
        string query,
        string? patientMrn,
        CancellationToken ct)
    {
        // Run keyword and vector search in parallel
        var keywordTask = _keywordSearch.SearchAsync(query, patientMrn, topK: 10, ct);
        var embedding   = await _embeddings.GenerateEmbeddingAsync(query, null, ct);
        var vectorTask  = _vectorStore.SearchAsync(embedding.ToArray(), patientMrn, topK: 10, ct: ct);

        await Task.WhenAll(keywordTask, vectorTask);

        // Reciprocal Rank Fusion — combine keyword and vector rankings
        return ReciprocRankFusion(keywordTask.Result, vectorTask.Result, k: 60)
            .Take(5)
            .ToList();
    }

    // RRF score: 1 / (rank + k) — higher score = better combined rank
    private static IOrderedEnumerable<ScoredChunk> ReciprocRankFusion(
        IReadOnlyList<ScoredChunk> keywordResults,
        IReadOnlyList<ScoredChunk> vectorResults,
        int k)
    {
        var scores = new Dictionary<Guid, float>();

        for (int i = 0; i < keywordResults.Count; i++)
            scores[keywordResults[i].Id] =
                scores.GetValueOrDefault(keywordResults[i].Id) + 1f / (i + 1 + k);

        for (int i = 0; i < vectorResults.Count; i++)
            scores[vectorResults[i].Id] =
                scores.GetValueOrDefault(vectorResults[i].Id) + 1f / (i + 1 + k);

        var allChunks = keywordResults.Concat(vectorResults)
            .GroupBy(c => c.Id)
            .Select(g => g.First())
            .ToDictionary(c => c.Id);

        return allChunks.Values.OrderByDescending(c => scores.GetValueOrDefault(c.Id));
    }
}

Citing Sources in Responses

// For clinical systems, every RAG answer must cite its source documents
// This enables prescribers to verify the AI's reasoning

public sealed class CitationFormattingService
{
    public string FormatResponseWithCitations(
        string                   aiResponse,
        IReadOnlyList<ScoredChunk> sources)
    {
        if (!sources.Any())
            return aiResponse;

        var citations = new StringBuilder("\n\n**Sources used:**\n");
        for (int i = 0; i < sources.Count; i++)
        {
            citations.AppendLine(
                $"[Document {i + 1}] {sources[i].SourceType} " +
                $"(relevance: {sources[i].SimilarityScore:P0})");
        }

        return aiResponse + citations.ToString();
    }
}

// API response model with source metadata:
public sealed record RagApiResponse(
    string                      Answer,
    bool                        IsGrounded,
    IReadOnlyList<SourceCitation> Sources);

public sealed record SourceCitation(
    string DocumentType,
    float  RelevanceScore,
    string Preview);    // first 200 characters of the retrieved chunk

Production issue I've seen: A RAG system for clinical guidelines was retrieving the right documents but the AI was producing answers that contradicted them. The system prompt said: "Answer questions about prescriptions using the documents provided and your medical knowledge." The phrase "and your medical knowledge" allowed the AI to blend retrieved content with training data — and when there was a conflict (e.g., the retrieved guideline was updated but the AI's training data was older), the AI would sometimes prefer its training data. Removing "and your medical knowledge" and replacing it with "ONLY use the provided documents — if the answer is not in the documents, say so" eliminated the contradictions. RAG system prompts must prohibit the AI from using general training knowledge for clinical facts.

Key Takeaway

RAG retrieval: embed the user query, search the vector store for similar document chunks, assemble a context string, inject it into the system prompt, and instruct the AI to answer only from the retrieved documents. For clinical RAG: always include a "not in documents — say so" instruction to prevent the AI from falling back to potentially outdated training data. Hybrid retrieval (keyword + vector with Reciprocal Rank Fusion) outperforms pure vector search for domain-specific clinical terms. Always include source citations in responses — prescribers need to verify AI-generated clinical information.

RAG Retrieval — Finding and Injecting Context into the AI

The RAG Retrieval Pipeline

Query Embedding and Retrieval

Prompt Assembly with Retrieved Context

Hybrid Retrieval: Keyword + Vector

Citing Sources in Responses

Key Takeaway

Enjoyed this article?

Leave a comment