Back to blog
AI Systemsintermediate

Skill 5 — Vector Search: Azure AI Search HNSW + pgvector Hybrid Retrieval

Implement hybrid vector search combining Azure AI Search semantic embeddings with BM25 keyword fallback, plus pgvector as a local development alternative.

Asma Hafeez KhanMay 15, 20264 min read
Vector SearchAzure AI SearchpgvectorHNSWHybrid SearchRAG
Share:𝕏

Two Types of Search — and Why You Need Both

Dense retrieval (vector search): converts text to embeddings and finds semantically similar chunks. Great for paraphrases — "ibuprofen pain relief" finds "NSAIDs analgesic effect" even though no keywords match.

Sparse retrieval (BM25 keyword): finds documents containing the exact words. Great for drug names — "warfarin" is a specific term that embeddings might not privilege highly.

Hybrid search combines both scores and outperforms either alone. Azure AI Search supports this natively.


Azure AI Search — Index Schema

The index schema defines how documents are stored and searched:

Python
# pharmabot/rag/search_index.py
from azure.search.documents.indexes.models import (
    SearchIndex, SimpleField, SearchableField,
    SearchField, SearchFieldDataType, VectorSearch,
    HnswAlgorithmConfiguration, VectorSearchProfile,
)

def create_drug_index_schema() -> SearchIndex:
    return SearchIndex(
        name="pharmabot-drugs",
        fields=[
            SimpleField(name="id",          type=SearchFieldDataType.String, key=True),
            SearchableField(name="text",     type=SearchFieldDataType.String, analyzer_name="en.lucene"),
            SimpleField(name="drug_name",    type=SearchFieldDataType.String, filterable=True),
            SimpleField(name="ndc",          type=SearchFieldDataType.String, filterable=True),
            SimpleField(name="section",      type=SearchFieldDataType.String, filterable=True),
            SearchField(
                name="content_vector",
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True,
                vector_search_dimensions=1536,          # text-embedding-3-small
                vector_search_profile_name="hnsw-profile",
            ),
        ],
        vector_search=VectorSearch(
            algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo", parameters={"m": 4, "efConstruction": 400})],
            profiles=[VectorSearchProfile(name="hnsw-profile", algorithm_configuration_name="hnsw-algo")],
        ),
    )

HNSW parameters:

  • m=4 — number of bidirectional links per node. Higher = better recall, more memory
  • efConstruction=400 — build-time search depth. Higher = better index quality, slower build

The Hybrid Retriever

Python
# pharmabot/rag/retriever.py
from azure.search.documents.aio import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.core.credentials import AzureKeyCredential
from pharmabot.config import settings
import asyncpg   # pgvector fallback

class HybridRetriever:

    async def search(
        self, query: str, query_vector: list[float], top_k: int = 3
    ) -> list[dict]:
        try:
            return await self._azure_search(query, query_vector, top_k)
        except Exception:
            # Fallback to pgvector on Azure Search failure
            return await self._pgvector_search(query_vector, top_k)

    async def _azure_search(
        self, query: str, query_vector: list[float], top_k: int
    ) -> list[dict]:
        client = SearchClient(
            endpoint=settings.azure_search_endpoint,
            index_name=settings.azure_search_index,
            credential=AzureKeyCredential(settings.azure_search_api_key),
        )
        async with client:
            results = await client.search(
                search_text=query,                       # BM25 keyword component
                vector_queries=[
                    VectorizedQuery(
                        vector=query_vector,             # Dense vector component
                        k_nearest_neighbors=top_k,
                        fields="content_vector",
                    )
                ],
                select=["id", "text", "drug_name", "ndc", "section"],
                top=top_k,
            )
            chunks = []
            async for result in results:
                chunks.append({
                    "id":        result["id"],
                    "text":      result["text"],
                    "drug_name": result["drug_name"],
                    "section":   result["section"],
                    "score":     result["@search.score"],
                    "source":    "azure_search",
                })
            return chunks

    async def _pgvector_search(
        self, query_vector: list[float], top_k: int
    ) -> list[dict]:
        conn = await asyncpg.connect(settings.database_url)
        rows = await conn.fetch(
            """
            SELECT id, text, drug_name, section,
                   1 - (content_vector <=> $1::vector) AS score
            FROM drug_chunks
            ORDER BY content_vector <=> $1::vector
            LIMIT $2
            """,
            query_vector,
            top_k,
        )
        await conn.close()
        return [
            {**dict(row), "source": "pgvector"}
            for row in rows
        ]

pgvector — Local Development Setup

For local development without Azure AI Search, pgvector gives you the same semantic search inside PostgreSQL:

SQL
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create the drug chunks table
CREATE TABLE drug_chunks (
    id              TEXT PRIMARY KEY,
    text            TEXT NOT NULL,
    drug_name       TEXT NOT NULL,
    ndc             TEXT,
    section         TEXT,
    chunk_index     INTEGER,
    content_vector  vector(1536)   -- text-embedding-3-small dimension
);

-- HNSW index on the vector column
CREATE INDEX ON drug_chunks USING hnsw (content_vector vector_cosine_ops)
    WITH (m = 4, ef_construction = 64);

The retriever automatically falls back to pgvector when Azure Search is unavailable — so you can develop and test the full RAG pipeline locally without any Azure account.


Understanding Hybrid Search Scoring

Azure AI Search uses Reciprocal Rank Fusion (RRF) to combine BM25 and vector scores:

RRF(doc) = 1/(k + rank_bm25) + 1/(k + rank_vector)

Where k=60 by default. Documents that rank highly in both systems get the highest combined score. This is why hybrid outperforms either alone — a drug name match + semantic similarity gives strong evidence.


Retrieval Quality Metrics

Test retrieval quality before trusting it in production:

Python
# tests/test_rag.py
import pytest

RETRIEVAL_TESTS = [
    {
        "query": "metformin side effects kidney",
        "expected_drug": "Metformin",
        "expected_section": "WARNINGS",
    },
    {
        "query": "warfarin bleeding risk",
        "expected_drug": "Warfarin",
        "expected_section": "WARNINGS",
    },
]

@pytest.mark.parametrize("test", RETRIEVAL_TESTS)
async def test_retrieval_accuracy(test, pipeline):
    results = await pipeline.retrieve(test["query"])
    assert any(r["drug_name"] == test["expected_drug"] for r in results)
    assert any(r["section"] == test["expected_section"] for r in results)

Checkpoint

Compare Azure Search vs pgvector results:

Bash
# Azure AI Search
curl -X POST http://localhost:8000/api/search \
  -d '{"query": "warfarin bleeding risk interaction", "top_k": 3}'

# Temporarily set MOCK_AZURE=true in .env, restart, then:
curl -X POST http://localhost:8000/api/search \
  -d '{"query": "warfarin bleeding risk interaction", "top_k": 3}'

Both should return the same top drug — Warfarin WARNINGS section. The scores will differ (Azure RRF vs cosine similarity) but the ranked order should be nearly identical for clear queries.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.