Vector Search — Finding Relevant Documents by Meaning

How Vector Search Works

Traditional full-text search:
  Query: "warfarin dose adjustment"
  Finds: documents containing those exact words
  Misses: "Coumadin titration", "anticoagulant dosing protocol" (same meaning, different words)

Vector search:
  Query: "warfarin dose adjustment"
  → embed query → [0.23, -0.41, 0.87, ...]
  → compare against all stored document vectors
  → return documents whose vectors are closest (cosine similarity)
  Finds: documents about warfarin dose adjustment regardless of exact wording
  Also finds: Coumadin titration, anticoagulant dosing, INR-based dose calculation

Cosine similarity:
  Score of 1.0  → identical meaning
  Score above 0.8 → highly relevant
  Score 0.7–0.8  → relevant
  Score below 0.6 → likely irrelevant

In RAG:
  1. User asks: "What should I do if INR is above 3?"
  2. Embed the question
  3. Find top-5 document chunks with highest cosine similarity to the question
  4. Inject those chunks into the AI's context
  5. AI answers using only the retrieved chunks (grounded in real documents)

SQL Server Vector Search (2025+)

// SQL Server 2025 and Azure SQL support native VECTOR type
// NuGet: Microsoft.Data.SqlClient

// Schema:
// CREATE TABLE clinical_document_chunks (
//     id           UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
//     patient_mrn  NVARCHAR(20)  NOT NULL,
//     source_type  NVARCHAR(50)  NOT NULL,
//     chunk_text   NVARCHAR(MAX) NOT NULL,
//     embedding    VECTOR(1536)  NOT NULL,
//     indexed_at   DATETIME2     NOT NULL DEFAULT SYSUTCDATETIME()
// );
//
// CREATE INDEX idx_mrn ON clinical_document_chunks (patient_mrn);

public sealed class SqlServerVectorDocumentStore
{
    private readonly string _connectionString;

    public async Task UpsertAsync(EmbeddedDocument doc, CancellationToken ct)
    {
        const string sql = """
            MERGE clinical_document_chunks AS target
            USING (SELECT @id AS id) AS source ON target.id = source.id
            WHEN MATCHED THEN
                UPDATE SET chunk_text = @text, embedding = CAST(@embedding AS VECTOR(1536)), indexed_at = SYSUTCDATETIME()
            WHEN NOT MATCHED THEN
                INSERT (id, patient_mrn, source_type, chunk_text, embedding)
                VALUES (@id, @mrn, @sourceType, @text, CAST(@embedding AS VECTOR(1536)));
            """;

        await using var conn = new SqlConnection(_connectionString);
        await conn.ExecuteAsync(sql, new
        {
            id         = doc.Id,
            mrn        = doc.PatientMrn,
            sourceType = doc.SourceType,
            text       = doc.Text,
            embedding  = JsonSerializer.Serialize(doc.Embedding)  // pass as JSON string, cast in SQL
        });
    }

    public async Task<IReadOnlyList<ScoredChunk>> SearchAsync(
        float[]  queryEmbedding,
        string   patientMrn,
        int      topK = 5,
        float    minScore = 0.75f,
        CancellationToken ct = default)
    {
        // VECTOR_DISTANCE with cosine metric — lower distance = higher similarity
        const string sql = """
            SELECT TOP (@topK)
                id,
                chunk_text,
                source_type,
                1 - VECTOR_DISTANCE('cosine', embedding, CAST(@embedding AS VECTOR(1536))) AS similarity_score
            FROM clinical_document_chunks
            WHERE patient_mrn = @mrn
              AND 1 - VECTOR_DISTANCE('cosine', embedding, CAST(@embedding AS VECTOR(1536))) >= @minScore
            ORDER BY similarity_score DESC;
            """;

        await using var conn = new SqlConnection(_connectionString);
        var results = await conn.QueryAsync<ScoredChunkRow>(sql, new
        {
            topK      = topK,
            embedding = JsonSerializer.Serialize(queryEmbedding),
            mrn       = patientMrn,
            minScore  = minScore
        });

        return results.Select(r => new ScoredChunk(
            r.Id, r.ChunkText, r.SourceType, r.SimilarityScore)).ToList();
    }
}

public sealed record ScoredChunk(
    Guid   Id,
    string ChunkText,
    string SourceType,
    float  SimilarityScore);

pgvector on PostgreSQL

// NuGet: Npgsql.EntityFrameworkCore.PostgreSQL
// PostgreSQL extension: CREATE EXTENSION IF NOT EXISTS vector;

// EF Core entity:
public class ClinicalChunkEntity
{
    public Guid   Id          { get; set; }
    public string PatientMrn  { get; set; } = default!;
    public string SourceType  { get; set; } = default!;
    public string ChunkText   { get; set; } = default!;
    public Vector Embedding   { get; set; } = default!;  // Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal
}

// DbContext:
public class ClinicalVectorDbContext : DbContext
{
    public DbSet<ClinicalChunkEntity> Chunks => Set<ClinicalChunkEntity>();

    protected override void OnModelCreating(ModelBuilder model)
    {
        model.HasPostgresExtension("vector");

        model.Entity<ClinicalChunkEntity>(e =>
        {
            e.HasIndex(c => c.PatientMrn);
            // HNSW index for fast approximate nearest-neighbour search:
            e.HasIndex(c => c.Embedding)
             .HasMethod("hnsw")
             .HasOperators("vector_cosine_ops");
        });
    }
}

// Search query using pgvector cosine distance:
public async Task<IReadOnlyList<ScoredChunk>> SearchAsync(
    float[]  queryEmbedding,
    string   patientMrn,
    int      topK = 5,
    CancellationToken ct = default)
{
    var queryVector = new Vector(queryEmbedding);

    return await _context.Chunks
        .Where(c => c.PatientMrn == patientMrn)
        .OrderBy(c => c.Embedding.CosineDistance(queryVector))  // lower = more similar
        .Take(topK)
        .Select(c => new
        {
            c.Id,
            c.ChunkText,
            c.SourceType,
            Score = 1f - (float)c.Embedding.CosineDistance(queryVector)
        })
        .ToListAsync(ct)
        .ContinueWith(t => (IReadOnlyList<ScoredChunk>)
            t.Result.Select(r => new ScoredChunk(r.Id, r.ChunkText, r.SourceType, r.Score)).ToList(),
            ct);
}

Azure AI Search

// Azure AI Search provides managed vector search with hybrid (keyword + vector) support
// NuGet: Azure.Search.Documents

var searchClient = new SearchClient(
    new Uri(config["AzureSearch:Endpoint"]!),
    indexName: "clinical-documents",
    credential: new DefaultAzureCredential());

// Index schema (created via Azure portal or Bicep):
// Fields: id, patient_mrn, source_type, chunk_text, embedding (Collection(Edm.Single), dimensions=1536)

// Upload documents:
var batch = IndexDocumentsBatch.Upload(chunks.Select(c => new SearchDocument
{
    ["id"]          = c.Id.ToString(),
    ["patient_mrn"] = c.PatientMrn,
    ["source_type"] = c.SourceType,
    ["chunk_text"]  = c.Text,
    ["embedding"]   = c.Embedding
}));

await searchClient.IndexDocumentsAsync(batch, ct);

// Hybrid search — combines keyword + vector for better retrieval:
var searchOptions = new SearchOptions
{
    Filter          = $"patient_mrn eq '{patientMrn}'",
    Size            = 5,
    Select          = { "id", "chunk_text", "source_type" },
    VectorSearch    = new VectorSearchOptions
    {
        Queries =
        {
            new VectorizedQuery(queryEmbedding)
            {
                KNearestNeighborsCount = 10,
                Fields = { "embedding" }
            }
        }
    }
};

var results = await searchClient.SearchAsync<SearchDocument>(
    searchText: query,      // keyword component
    searchOptions,
    ct);

await foreach (var result in results.Value.GetResultsAsync())
{
    Console.WriteLine($"Score: {result.Score:F2} | {result.Document["chunk_text"]}");
}

Filtering Vector Search Results

// Always filter by patient when retrieving patient-specific documents
// Never return another patient's documents in the context

public sealed class FilteredVectorSearch
{
    private readonly IVectorDocumentStore _store;

    public async Task<IReadOnlyList<ScoredChunk>> SearchPatientDocumentsAsync(
        string query,
        string patientMrn,
        string? sourceTypeFilter = null,  // "clinical_guideline", "prescription_note", etc.
        CancellationToken ct = default)
    {
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, null, ct);

        // ALWAYS include patient_mrn filter — no cross-patient retrieval
        return await _store.SearchAsync(
            queryEmbedding: queryEmbedding.ToArray(),
            patientMrn:     patientMrn,       // mandatory isolation filter
            sourceType:     sourceTypeFilter,  // optional scope filter
            topK:           5,
            minScore:       0.75f,
            ct:             ct);
    }

    // For guideline searches (not patient-specific):
    public async Task<IReadOnlyList<ScoredChunk>> SearchGuidelinesAsync(
        string query, CancellationToken ct)
    {
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, null, ct);

        return await _store.SearchAsync(
            queryEmbedding: queryEmbedding.ToArray(),
            patientMrn:     null,              // no patient filter for guidelines
            sourceType:     "clinical_guideline",
            topK:           5,
            minScore:       0.70f,
            ct:             ct);
    }
}

Production issue I've seen: A RAG system was built to retrieve clinical documents to answer prescriber questions. The vector store was not filtered by patient — it searched across all patients. A pharmacist asked about a Warfarin note for patient MRN-001. The system retrieved a Warfarin note from patient MRN-047 (higher similarity score because MRN-047's note used more similar wording) and presented it as context for the answer about MRN-001. The AI answered using MRN-047's clinical data for a question about MRN-001. Always apply a patient identifier filter to every vector search that involves patient-specific documents. The filter is not optional — it is a clinical data isolation requirement.

Key Takeaway

Vector search finds semantically similar documents using embedding similarity (cosine distance) rather than keyword matching. In .NET: use SQL Server 2025 with native VECTOR type, pgvector with EF Core and HNSW index for approximate nearest-neighbour search, or Azure AI Search for managed hybrid (keyword + vector) search. Always filter vector searches by patient identifier — never retrieve cross-patient documents. Set a minimum similarity threshold (0.70–0.80) to avoid returning irrelevant low-scoring chunks. The quality of your retrieval directly determines the quality of RAG answers — irrelevant chunks produce wrong or misleading AI responses.

Vector Search — Finding Relevant Documents by Meaning

How Vector Search Works

SQL Server Vector Search (2025+)

pgvector on PostgreSQL

Azure AI Search

Filtering Vector Search Results

Key Takeaway

Enjoyed this article?

Leave a comment