Learnixo

RAG Chatbot in .NET · Lesson 2 of 6

Text Embeddings — Vectors, Similarity, and Models

What Embeddings Are

An embedding is a list of numbers (a vector) that represents the meaning of text.
Similar meanings produce similar vectors — measured by cosine similarity.

Example:
  "The patient's INR is 2.4"
  → [0.023, -0.441, 0.187, 0.902, ..., -0.031]  (1536 numbers for ada-002)

  "INR value of 2.4 was recorded"
  → [0.027, -0.438, 0.191, 0.899, ..., -0.029]  (very similar vector)

  "The weather is sunny today"
  → [0.712, 0.304, -0.553, 0.021, ..., 0.441]   (very different vector)

Cosine similarity score:
  INR sentence 1 vs INR sentence 2: 0.98  (nearly identical meaning)
  INR sentence 1 vs weather:        0.12  (unrelated)

This is the foundation of RAG (Retrieval-Augmented Generation):
  1. Embed all your documents → store vectors in a database
  2. Embed the user's query → find documents with similar vectors
  3. Pass the matching documents to the AI as context
  4. AI answers using the retrieved documents, not training data

Generating Embeddings with Semantic Kernel

C#
// NuGet: Microsoft.SemanticKernel.Connectors.AzureOpenAI

// Setup:
var embeddingService = new AzureOpenAITextEmbeddingGenerationService(
    deploymentName: "text-embedding-ada-002",   // or text-embedding-3-small
    endpoint:       config["AzureOpenAI:Endpoint"]!,
    apiKey:         config["AzureOpenAI:ApiKey"]!);

// Generate a single embedding:
var embedding = await embeddingService.GenerateEmbeddingAsync(
    "The patient's Warfarin dose is 5mg daily. INR target range: 2.0–3.0.",
    kernel: null,
    ct);

float[] vector = embedding.ToArray();  // 1536 floats for ada-002

// Generate embeddings for multiple texts (batch — more efficient):
var texts = new List<string>
{
    "Patient MRN-001: Warfarin 5mg daily, INR target 2.0–3.0",
    "Patient MRN-002: Apixaban 5mg twice daily, AF indication",
    "Patient MRN-003: Rivaroxaban 20mg daily with evening meal, DVT prophylaxis"
};

var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts, kernel: null, ct);
// Returns IList<ReadOnlyMemory<float>> — one per input text

// Batching reduces API calls:
// 100 documents → 1 batch call (max ~2048 items per Ada-002 request)
// vs 100 individual calls

Embedding Clinical Documents

C#
// Service to embed and store clinical document chunks

public sealed class ClinicalDocumentEmbeddingService
{
    private readonly ITextEmbeddingGenerationService _embeddings;
    private readonly IVectorDocumentStore            _vectorStore;

    public async Task IndexDocumentAsync(
        ClinicalDocument document,
        CancellationToken ct)
    {
        // Chunk the document (see rag-chunking for strategies)
        var chunks = ChunkDocument(document);

        // Embed all chunks in one batch
        var texts      = chunks.Select(c => c.Text).ToList();
        var embeddings = await _embeddings.GenerateEmbeddingsAsync(texts, null, ct);

        // Store each chunk with its embedding
        var documents = chunks.Zip(embeddings, (chunk, embedding) =>
            new EmbeddedDocument(
                Id:           Guid.NewGuid(),
                SourceId:     document.Id,
                SourceType:   document.Type,
                PatientMrn:   document.PatientMrn,
                ChunkIndex:   chunk.Index,
                Text:         chunk.Text,
                Embedding:    embedding.ToArray(),
                IndexedAt:    DateTime.UtcNow));

        await _vectorStore.UpsertBatchAsync(documents, ct);
    }

    private static IReadOnlyList<DocumentChunk> ChunkDocument(ClinicalDocument doc)
    {
        // Sliding window chunking: 500 tokens, 50-token overlap
        const int ChunkSize    = 500;
        const int OverlapTokens = 50;

        var words  = doc.Content.Split(' ');
        var chunks = new List<DocumentChunk>();
        var index  = 0;
        var chunkIndex = 0;

        while (index < words.Length)
        {
            var end  = Math.Min(index + ChunkSize, words.Length);
            var text = string.Join(" ", words[index..end]);
            chunks.Add(new DocumentChunk(chunkIndex++, text));
            index += ChunkSize - OverlapTokens;
        }

        return chunks;
    }
}

public sealed record ClinicalDocument(
    Guid   Id,
    string PatientMrn,
    string Type,     // "discharge_letter", "prescription_note", "clinical_guideline"
    string Content);

public sealed record EmbeddedDocument(
    Guid     Id,
    Guid     SourceId,
    string   SourceType,
    string   PatientMrn,
    int      ChunkIndex,
    string   Text,
    float[]  Embedding,
    DateTime IndexedAt);

public sealed record DocumentChunk(int Index, string Text);

Choosing an Embedding Model

Model comparison for clinical .NET applications:

text-embedding-ada-002 (Azure OpenAI):
  Dimensions:  1536
  Max input:   8191 tokens
  Cost:        $0.0001 / 1K tokens
  Quality:     Good for English clinical text
  Best for:    General-purpose RAG, most production use cases

text-embedding-3-small (Azure OpenAI):
  Dimensions:  1536 (default) or configurable
  Max input:   8191 tokens
  Cost:        Lower than ada-002
  Quality:     Better than ada-002, especially for multilingual
  Best for:    New projects — prefer over ada-002

text-embedding-3-large:
  Dimensions:  3072
  Quality:     Best available from OpenAI
  Cost:        Higher
  Best for:    High-accuracy RAG where quality matters more than cost

nomic-embed-text (Ollama — local):
  Dimensions:  768
  Cost:        Free (local compute)
  Quality:     Good for development; lower than cloud models for clinical text
  Best for:    Local development only — do not use for production clinical RAG

Rule of thumb:
  Start with text-embedding-3-small.
  If retrieval quality is insufficient, upgrade to text-embedding-3-large.
  Keep the same model for both indexing and query embedding — mixing models
  produces garbage similarity scores.

Embedding Model Registration in .NET

C#
// Register the embedding service with DI:
builder.Services.AddSingleton<ITextEmbeddingGenerationService>(_ =>
    new AzureOpenAITextEmbeddingGenerationService(
        deploymentName: builder.Configuration["AzureOpenAI:EmbeddingDeployment"]!,
        endpoint:       builder.Configuration["AzureOpenAI:Endpoint"]!,
        credential:     new DefaultAzureCredential()));

// For local development with Ollama:
if (builder.Environment.IsDevelopment())
{
    builder.Services.AddSingleton<ITextEmbeddingGenerationService>(_ =>
        new OllamaTextEmbeddingGenerationService(
            modelId:  "nomic-embed-text",
            endpoint: new Uri("http://localhost:11434")));
}

// Avoid re-embedding unchanged content — cache by content hash:
public sealed class CachingEmbeddingService(
    ITextEmbeddingGenerationService inner,
    IDistributedCache               cache) : ITextEmbeddingGenerationService
{
    public async Task<IList<ReadOnlyMemory<float>>> GenerateEmbeddingsAsync(
        IList<string> data, Kernel? kernel = null, CancellationToken ct = default)
    {
        var results = new List<ReadOnlyMemory<float>>(data.Count);

        foreach (var text in data)
        {
            var key    = $"embedding:{ComputeHash(text)}";
            var cached = await cache.GetStringAsync(key, ct);

            if (cached is not null)
            {
                results.Add(JsonSerializer.Deserialize<float[]>(cached)!);
                continue;
            }

            var embedding = (await inner.GenerateEmbeddingsAsync([text], kernel, ct))[0];
            await cache.SetStringAsync(key,
                JsonSerializer.Serialize(embedding.ToArray()),
                new DistributedCacheEntryOptions { SlidingExpiration = TimeSpan.FromDays(7) },
                ct);

            results.Add(embedding);
        }

        return results;
    }

    private static string ComputeHash(string text)
    {
        var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(text));
        return Convert.ToHexString(bytes)[..16];
    }
}

Production issue I've seen: A team built a RAG system for clinical guidelines using text-embedding-ada-002 and indexed 2,000 documents. Six months later, they switched to text-embedding-3-small for new documents because the quality was better. The index now contained vectors from two different models — and similarity search would compare ada-002 vectors from old documents against text-embedding-3-small vectors from new ones. The scores were meaningless: some relevant old documents were never retrieved, and some irrelevant new ones scored highly. When you change embedding models, you must re-embed your entire existing index with the new model. There is no migration path — the vectors are incompatible.


Key Takeaway

Embeddings convert text into vectors where semantic similarity corresponds to mathematical proximity. Use ITextEmbeddingGenerationService in Semantic Kernel to generate embeddings, batch inputs for efficiency, and store vectors alongside the source text in a vector store. Use text-embedding-3-small for most new .NET RAG applications. Always use the same model for both indexing and querying — mixing models produces incorrect similarity scores. Cache embeddings by content hash to avoid re-embedding unchanged documents.