Vector Search — Finding Relevant Documents by Meaning
Implement vector search in .NET RAG systems: SQL Server vector search, pgvector on PostgreSQL, Azure AI Search, similarity metrics (cosine vs dot product), filtering, and performance tuning for clinical document retrieval.
How Vector Search Works
Traditional full-text search:
Query: "warfarin dose adjustment"
Finds: documents containing those exact words
Misses: "Coumadin titration", "anticoagulant dosing protocol" (same meaning, different words)
Vector search:
Query: "warfarin dose adjustment"
→ embed query → [0.23, -0.41, 0.87, ...]
→ compare against all stored document vectors
→ return documents whose vectors are closest (cosine similarity)
Finds: documents about warfarin dose adjustment regardless of exact wording
Also finds: Coumadin titration, anticoagulant dosing, INR-based dose calculation
Cosine similarity:
Score of 1.0 → identical meaning
Score above 0.8 → highly relevant
Score 0.7–0.8 → relevant
Score below 0.6 → likely irrelevant
In RAG:
1. User asks: "What should I do if INR is above 3?"
2. Embed the question
3. Find top-5 document chunks with highest cosine similarity to the question
4. Inject those chunks into the AI's context
5. AI answers using only the retrieved chunks (grounded in real documents)SQL Server Vector Search (2025+)
// SQL Server 2025 and Azure SQL support native VECTOR type
// NuGet: Microsoft.Data.SqlClient
// Schema:
// CREATE TABLE clinical_document_chunks (
// id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
// patient_mrn NVARCHAR(20) NOT NULL,
// source_type NVARCHAR(50) NOT NULL,
// chunk_text NVARCHAR(MAX) NOT NULL,
// embedding VECTOR(1536) NOT NULL,
// indexed_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
// );
//
// CREATE INDEX idx_mrn ON clinical_document_chunks (patient_mrn);
public sealed class SqlServerVectorDocumentStore
{
private readonly string _connectionString;
public async Task UpsertAsync(EmbeddedDocument doc, CancellationToken ct)
{
const string sql = """
MERGE clinical_document_chunks AS target
USING (SELECT @id AS id) AS source ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET chunk_text = @text, embedding = CAST(@embedding AS VECTOR(1536)), indexed_at = SYSUTCDATETIME()
WHEN NOT MATCHED THEN
INSERT (id, patient_mrn, source_type, chunk_text, embedding)
VALUES (@id, @mrn, @sourceType, @text, CAST(@embedding AS VECTOR(1536)));
""";
await using var conn = new SqlConnection(_connectionString);
await conn.ExecuteAsync(sql, new
{
id = doc.Id,
mrn = doc.PatientMrn,
sourceType = doc.SourceType,
text = doc.Text,
embedding = JsonSerializer.Serialize(doc.Embedding) // pass as JSON string, cast in SQL
});
}
public async Task<IReadOnlyList<ScoredChunk>> SearchAsync(
float[] queryEmbedding,
string patientMrn,
int topK = 5,
float minScore = 0.75f,
CancellationToken ct = default)
{
// VECTOR_DISTANCE with cosine metric — lower distance = higher similarity
const string sql = """
SELECT TOP (@topK)
id,
chunk_text,
source_type,
1 - VECTOR_DISTANCE('cosine', embedding, CAST(@embedding AS VECTOR(1536))) AS similarity_score
FROM clinical_document_chunks
WHERE patient_mrn = @mrn
AND 1 - VECTOR_DISTANCE('cosine', embedding, CAST(@embedding AS VECTOR(1536))) >= @minScore
ORDER BY similarity_score DESC;
""";
await using var conn = new SqlConnection(_connectionString);
var results = await conn.QueryAsync<ScoredChunkRow>(sql, new
{
topK = topK,
embedding = JsonSerializer.Serialize(queryEmbedding),
mrn = patientMrn,
minScore = minScore
});
return results.Select(r => new ScoredChunk(
r.Id, r.ChunkText, r.SourceType, r.SimilarityScore)).ToList();
}
}
public sealed record ScoredChunk(
Guid Id,
string ChunkText,
string SourceType,
float SimilarityScore);pgvector on PostgreSQL
// NuGet: Npgsql.EntityFrameworkCore.PostgreSQL
// PostgreSQL extension: CREATE EXTENSION IF NOT EXISTS vector;
// EF Core entity:
public class ClinicalChunkEntity
{
public Guid Id { get; set; }
public string PatientMrn { get; set; } = default!;
public string SourceType { get; set; } = default!;
public string ChunkText { get; set; } = default!;
public Vector Embedding { get; set; } = default!; // Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal
}
// DbContext:
public class ClinicalVectorDbContext : DbContext
{
public DbSet<ClinicalChunkEntity> Chunks => Set<ClinicalChunkEntity>();
protected override void OnModelCreating(ModelBuilder model)
{
model.HasPostgresExtension("vector");
model.Entity<ClinicalChunkEntity>(e =>
{
e.HasIndex(c => c.PatientMrn);
// HNSW index for fast approximate nearest-neighbour search:
e.HasIndex(c => c.Embedding)
.HasMethod("hnsw")
.HasOperators("vector_cosine_ops");
});
}
}
// Search query using pgvector cosine distance:
public async Task<IReadOnlyList<ScoredChunk>> SearchAsync(
float[] queryEmbedding,
string patientMrn,
int topK = 5,
CancellationToken ct = default)
{
var queryVector = new Vector(queryEmbedding);
return await _context.Chunks
.Where(c => c.PatientMrn == patientMrn)
.OrderBy(c => c.Embedding.CosineDistance(queryVector)) // lower = more similar
.Take(topK)
.Select(c => new
{
c.Id,
c.ChunkText,
c.SourceType,
Score = 1f - (float)c.Embedding.CosineDistance(queryVector)
})
.ToListAsync(ct)
.ContinueWith(t => (IReadOnlyList<ScoredChunk>)
t.Result.Select(r => new ScoredChunk(r.Id, r.ChunkText, r.SourceType, r.Score)).ToList(),
ct);
}Azure AI Search
// Azure AI Search provides managed vector search with hybrid (keyword + vector) support
// NuGet: Azure.Search.Documents
var searchClient = new SearchClient(
new Uri(config["AzureSearch:Endpoint"]!),
indexName: "clinical-documents",
credential: new DefaultAzureCredential());
// Index schema (created via Azure portal or Bicep):
// Fields: id, patient_mrn, source_type, chunk_text, embedding (Collection(Edm.Single), dimensions=1536)
// Upload documents:
var batch = IndexDocumentsBatch.Upload(chunks.Select(c => new SearchDocument
{
["id"] = c.Id.ToString(),
["patient_mrn"] = c.PatientMrn,
["source_type"] = c.SourceType,
["chunk_text"] = c.Text,
["embedding"] = c.Embedding
}));
await searchClient.IndexDocumentsAsync(batch, ct);
// Hybrid search — combines keyword + vector for better retrieval:
var searchOptions = new SearchOptions
{
Filter = $"patient_mrn eq '{patientMrn}'",
Size = 5,
Select = { "id", "chunk_text", "source_type" },
VectorSearch = new VectorSearchOptions
{
Queries =
{
new VectorizedQuery(queryEmbedding)
{
KNearestNeighborsCount = 10,
Fields = { "embedding" }
}
}
}
};
var results = await searchClient.SearchAsync<SearchDocument>(
searchText: query, // keyword component
searchOptions,
ct);
await foreach (var result in results.Value.GetResultsAsync())
{
Console.WriteLine($"Score: {result.Score:F2} | {result.Document["chunk_text"]}");
}Filtering Vector Search Results
// Always filter by patient when retrieving patient-specific documents
// Never return another patient's documents in the context
public sealed class FilteredVectorSearch
{
private readonly IVectorDocumentStore _store;
public async Task<IReadOnlyList<ScoredChunk>> SearchPatientDocumentsAsync(
string query,
string patientMrn,
string? sourceTypeFilter = null, // "clinical_guideline", "prescription_note", etc.
CancellationToken ct = default)
{
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, null, ct);
// ALWAYS include patient_mrn filter — no cross-patient retrieval
return await _store.SearchAsync(
queryEmbedding: queryEmbedding.ToArray(),
patientMrn: patientMrn, // mandatory isolation filter
sourceType: sourceTypeFilter, // optional scope filter
topK: 5,
minScore: 0.75f,
ct: ct);
}
// For guideline searches (not patient-specific):
public async Task<IReadOnlyList<ScoredChunk>> SearchGuidelinesAsync(
string query, CancellationToken ct)
{
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, null, ct);
return await _store.SearchAsync(
queryEmbedding: queryEmbedding.ToArray(),
patientMrn: null, // no patient filter for guidelines
sourceType: "clinical_guideline",
topK: 5,
minScore: 0.70f,
ct: ct);
}
}Production issue I've seen: A RAG system was built to retrieve clinical documents to answer prescriber questions. The vector store was not filtered by patient — it searched across all patients. A pharmacist asked about a Warfarin note for patient MRN-001. The system retrieved a Warfarin note from patient MRN-047 (higher similarity score because MRN-047's note used more similar wording) and presented it as context for the answer about MRN-001. The AI answered using MRN-047's clinical data for a question about MRN-001. Always apply a patient identifier filter to every vector search that involves patient-specific documents. The filter is not optional — it is a clinical data isolation requirement.
Key Takeaway
Vector search finds semantically similar documents using embedding similarity (cosine distance) rather than keyword matching. In .NET: use SQL Server 2025 with native
VECTORtype, pgvector with EF Core and HNSW index for approximate nearest-neighbour search, or Azure AI Search for managed hybrid (keyword + vector) search. Always filter vector searches by patient identifier — never retrieve cross-patient documents. Set a minimum similarity threshold (0.70–0.80) to avoid returning irrelevant low-scoring chunks. The quality of your retrieval directly determines the quality of RAG answers — irrelevant chunks produce wrong or misleading AI responses.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.