Skill 5 — Vector Search: Azure AI Search HNSW + pgvector Hybrid Retrieval
Implement hybrid vector search combining Azure AI Search semantic embeddings with BM25 keyword fallback, plus pgvector as a local development alternative.
Two Types of Search — and Why You Need Both
Dense retrieval (vector search): converts text to embeddings and finds semantically similar chunks. Great for paraphrases — "ibuprofen pain relief" finds "NSAIDs analgesic effect" even though no keywords match.
Sparse retrieval (BM25 keyword): finds documents containing the exact words. Great for drug names — "warfarin" is a specific term that embeddings might not privilege highly.
Hybrid search combines both scores and outperforms either alone. Azure AI Search supports this natively.
Azure AI Search — Index Schema
The index schema defines how documents are stored and searched:
# pharmabot/rag/search_index.py
from azure.search.documents.indexes.models import (
SearchIndex, SimpleField, SearchableField,
SearchField, SearchFieldDataType, VectorSearch,
HnswAlgorithmConfiguration, VectorSearchProfile,
)
def create_drug_index_schema() -> SearchIndex:
return SearchIndex(
name="pharmabot-drugs",
fields=[
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchableField(name="text", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
SimpleField(name="drug_name", type=SearchFieldDataType.String, filterable=True),
SimpleField(name="ndc", type=SearchFieldDataType.String, filterable=True),
SimpleField(name="section", type=SearchFieldDataType.String, filterable=True),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536, # text-embedding-3-small
vector_search_profile_name="hnsw-profile",
),
],
vector_search=VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo", parameters={"m": 4, "efConstruction": 400})],
profiles=[VectorSearchProfile(name="hnsw-profile", algorithm_configuration_name="hnsw-algo")],
),
)HNSW parameters:
m=4— number of bidirectional links per node. Higher = better recall, more memoryefConstruction=400— build-time search depth. Higher = better index quality, slower build
The Hybrid Retriever
# pharmabot/rag/retriever.py
from azure.search.documents.aio import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential
from pharmabot.config import settings
import asyncpg # pgvector fallback
class HybridRetriever:
async def search(
self, query: str, query_vector: list[float], top_k: int = 3
) -> list[dict]:
try:
return await self._azure_search(query, query_vector, top_k)
except Exception:
# Fallback to pgvector on Azure Search failure
return await self._pgvector_search(query_vector, top_k)
async def _azure_search(
self, query: str, query_vector: list[float], top_k: int
) -> list[dict]:
client = SearchClient(
endpoint=settings.azure_search_endpoint,
index_name=settings.azure_search_index,
credential=AzureKeyCredential(settings.azure_search_api_key),
)
async with client:
results = await client.search(
search_text=query, # BM25 keyword component
vector_queries=[
VectorizedQuery(
vector=query_vector, # Dense vector component
k_nearest_neighbors=top_k,
fields="content_vector",
)
],
select=["id", "text", "drug_name", "ndc", "section"],
top=top_k,
)
chunks = []
async for result in results:
chunks.append({
"id": result["id"],
"text": result["text"],
"drug_name": result["drug_name"],
"section": result["section"],
"score": result["@search.score"],
"source": "azure_search",
})
return chunks
async def _pgvector_search(
self, query_vector: list[float], top_k: int
) -> list[dict]:
conn = await asyncpg.connect(settings.database_url)
rows = await conn.fetch(
"""
SELECT id, text, drug_name, section,
1 - (content_vector <=> $1::vector) AS score
FROM drug_chunks
ORDER BY content_vector <=> $1::vector
LIMIT $2
""",
query_vector,
top_k,
)
await conn.close()
return [
{**dict(row), "source": "pgvector"}
for row in rows
]pgvector — Local Development Setup
For local development without Azure AI Search, pgvector gives you the same semantic search inside PostgreSQL:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create the drug chunks table
CREATE TABLE drug_chunks (
id TEXT PRIMARY KEY,
text TEXT NOT NULL,
drug_name TEXT NOT NULL,
ndc TEXT,
section TEXT,
chunk_index INTEGER,
content_vector vector(1536) -- text-embedding-3-small dimension
);
-- HNSW index on the vector column
CREATE INDEX ON drug_chunks USING hnsw (content_vector vector_cosine_ops)
WITH (m = 4, ef_construction = 64);The retriever automatically falls back to pgvector when Azure Search is unavailable — so you can develop and test the full RAG pipeline locally without any Azure account.
Understanding Hybrid Search Scoring
Azure AI Search uses Reciprocal Rank Fusion (RRF) to combine BM25 and vector scores:
RRF(doc) = 1/(k + rank_bm25) + 1/(k + rank_vector)Where k=60 by default. Documents that rank highly in both systems get the highest combined score. This is why hybrid outperforms either alone — a drug name match + semantic similarity gives strong evidence.
Retrieval Quality Metrics
Test retrieval quality before trusting it in production:
# tests/test_rag.py
import pytest
RETRIEVAL_TESTS = [
{
"query": "metformin side effects kidney",
"expected_drug": "Metformin",
"expected_section": "WARNINGS",
},
{
"query": "warfarin bleeding risk",
"expected_drug": "Warfarin",
"expected_section": "WARNINGS",
},
]
@pytest.mark.parametrize("test", RETRIEVAL_TESTS)
async def test_retrieval_accuracy(test, pipeline):
results = await pipeline.retrieve(test["query"])
assert any(r["drug_name"] == test["expected_drug"] for r in results)
assert any(r["section"] == test["expected_section"] for r in results)Checkpoint
Compare Azure Search vs pgvector results:
# Azure AI Search
curl -X POST http://localhost:8000/api/search \
-d '{"query": "warfarin bleeding risk interaction", "top_k": 3}'
# Temporarily set MOCK_AZURE=true in .env, restart, then:
curl -X POST http://localhost:8000/api/search \
-d '{"query": "warfarin bleeding risk interaction", "top_k": 3}'Both should return the same top drug — Warfarin WARNINGS section. The scores will differ (Azure RRF vs cosine similarity) but the ranked order should be nearly identical for clear queries.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.