Capstone: Build an AI-Powered Multi-Tenant SaaS

This is the SystemForge capstone. You've studied each pattern in isolation. Now you apply all of them together in a single coherent system: a multi-tenant B2B SaaS platform with AI features, built with the production practices that distinguish senior engineers from junior ones.

What you're building: KnowledgeOS — a B2B knowledge management SaaS where teams can upload documents, ask questions, and get AI-powered answers grounded in their own content.

Tech stack: ASP.NET Core 9, EF Core 9, PostgreSQL 16 + pgvector, Redis, Azure Container Apps, GitHub Actions

System Overview

KnowledgeOS features:
  - Multi-tenant: each company is isolated, with their own documents
  - Document ingestion: PDF, DOCX, Markdown → chunked → embedded → stored
  - AI chat: questions answered from the company's documents only
  - Usage metering: token budget per tenant, per-plan limits
  - Admin portal: tenant management, usage dashboard, billing hooks
  - MCP server: companies can connect Claude Desktop to their knowledge base

Architecture:
  ┌─────────────────────────────────────────────────────────┐
  │  Azure Container Apps                                    │
  │                                                          │
  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
  │  │  API          │  │  Ingestion   │  │  MCP Server  │  │
  │  │  (ASP.NET)    │  │  Worker      │  │  (SSE)       │  │
  │  └──────┬────────┘  └──────┬───────┘  └──────┬───────┘  │
  │         │                  │                  │          │
  │  ┌──────▼──────────────────▼──────────────────▼───────┐  │
  │  │  PostgreSQL (pgvector) + Redis                      │  │
  │  └─────────────────────────────────────────────────────┘  │
  └─────────────────────────────────────────────────────────┘

Phase 1: Multi-Tenant Foundation

Tenant Model

// src/KnowledgeOS.Core/Entities/Tenant.cs
public class Tenant
{
    public int         Id            { get; set; }
    public string      Name          { get; set; } = "";
    public string      Slug          { get; set; } = "";   // URL-safe identifier
    public TenantPlan  Plan          { get; set; } = TenantPlan.Starter;
    public bool        IsActive      { get; set; } = true;
    public DateTime    CreatedAt     { get; set; } = DateTime.UtcNow;

    // Usage limits (per plan)
    public int   MonthlyTokenBudget  { get; set; } = 500_000;    // 500K tokens/month
    public int   MaxDocuments        { get; set; } = 50;
    public int   MaxStorageMb        { get; set; } = 500;

    // Usage tracking (reset monthly)
    public int   TokensUsedThisMonth { get; set; }
    public int   DocumentCount       { get; set; }
}

public enum TenantPlan { Starter, Professional, Enterprise }

// Global query filter — all entities filtered by tenant automatically
public class KnowledgeDbContext(
    DbContextOptions<KnowledgeDbContext> opts,
    ITenantContext tenant)
    : DbContext(opts)
{
    public DbSet<Tenant>        Tenants        => Set<Tenant>();
    public DbSet<Document>      Documents      => Set<Document>();
    public DbSet<DocumentChunk> DocumentChunks => Set<DocumentChunk>();
    public DbSet<Conversation>  Conversations  => Set<Conversation>();
    public DbSet<ApiKey>        ApiKeys        => Set<ApiKey>();

    protected override void OnModelCreating(ModelBuilder model)
    {
        model.HasPostgresExtension("vector");

        // Tenant isolation — applied to all tenant-scoped entities
        model.Entity<Document>().HasQueryFilter(
            d => d.TenantId == tenant.TenantId);
        model.Entity<DocumentChunk>().HasQueryFilter(
            c => c.TenantId == tenant.TenantId);
        model.Entity<Conversation>().HasQueryFilter(
            c => c.TenantId == tenant.TenantId);

        model.Entity<DocumentChunk>(e =>
        {
            e.Property(c => c.Embedding).HasColumnType("vector(1536)");
            e.HasIndex(c => c.Embedding)
             .HasMethod("hnsw")
             .HasOperators("vector_cosine_ops");
        });
    }
}

API Key Authentication

// Tenants authenticate via API keys, not user sessions
public class ApiKeyAuthHandler(
    KnowledgeDbContext db,
    ILogger<ApiKeyAuthHandler> logger)
    : AuthenticationHandler<AuthenticationSchemeOptions>
{
    protected override async Task<AuthenticateResult> HandleAuthenticateAsync()
    {
        if (!Request.Headers.TryGetValue("X-API-Key", out var keyHeader))
            return AuthenticateResult.NoResult();

        var key       = keyHeader.ToString();
        var keyHash   = Convert.ToHexString(
            System.Security.Cryptography.SHA256.HashData(
                System.Text.Encoding.UTF8.GetBytes(key)));

        // Look up key (ignore tenant filter — this runs before tenant is set)
        var apiKey = await db.ApiKeys.IgnoreQueryFilters()
            .Include(k => k.Tenant)
            .FirstOrDefaultAsync(k => k.KeyHash == keyHash && k.IsActive);

        if (apiKey is null)
        {
            logger.LogWarning("Invalid API key attempted");
            return AuthenticateResult.Fail("Invalid API key");
        }

        if (!apiKey.Tenant.IsActive)
            return AuthenticateResult.Fail("Tenant account is suspended");

        var claims = new[]
        {
            new Claim("tenant_id", apiKey.TenantId.ToString()),
            new Claim("key_id",    apiKey.Id.ToString()),
            new Claim(ClaimTypes.Role, apiKey.Role),
        };

        var identity  = new ClaimsIdentity(claims, Scheme.Name);
        var principal = new ClaimsPrincipal(identity);
        var ticket    = new AuthenticationTicket(principal, Scheme.Name);

        return AuthenticateResult.Success(ticket);
    }
}

Phase 2: Document Ingestion Pipeline

// src/KnowledgeOS.Application/Ingestion/IngestDocumentCommandHandler.cs
public class IngestDocumentCommandHandler(
    KnowledgeDbContext db,
    DocumentParser parser,
    SemanticChunker chunker,
    IEmbeddingGenerator<string, Embedding<float>> embedder,
    ITenantContext tenant,
    ILogger<IngestDocumentCommandHandler> logger)
    : IRequestHandler<IngestDocumentCommand, IngestResult>
{
    public async Task<IngestResult> Handle(
        IngestDocumentCommand cmd, CancellationToken ct)
    {
        // Enforce plan limits
        var tenantRecord = await db.Tenants.FindAsync([tenant.TenantId], ct)!;
        if (tenantRecord!.DocumentCount >= tenantRecord.MaxDocuments)
            return IngestResult.Rejected(
                $"Document limit reached ({tenantRecord.MaxDocuments} max on your plan)");

        var fileSize = new FileInfo(cmd.FilePath).Length / (1024 * 1024);  // MB
        if (tenantRecord.DocumentCount * 10 > tenantRecord.MaxStorageMb)   // rough estimate
            return IngestResult.Rejected("Storage limit reached");

        // Extract and chunk text
        var rawText     = parser.ExtractText(cmd.FilePath);
        var contentHash = ComputeHash(rawText);

        // Skip if document unchanged
        var existing = await db.Documents.IgnoreQueryFilters()
            .FirstOrDefaultAsync(d => d.TenantId == tenant.TenantId
                                   && d.SourcePath == cmd.FilePath, ct);

        if (existing?.ContentHash == contentHash)
            return IngestResult.Skipped("Document unchanged");

        // Remove old chunks if re-ingesting
        if (existing is not null)
        {
            db.DocumentChunks.RemoveRange(
                db.DocumentChunks.Where(c => c.DocumentId == existing.Id));
            await db.SaveChangesAsync(ct);
        }

        var doc = existing ?? new Document
        {
            TenantId  = tenant.TenantId,
            CreatedAt = DateTime.UtcNow,
        };

        doc.Title        = cmd.Title;
        doc.SourcePath   = cmd.FilePath;
        doc.ContentHash  = contentHash;
        doc.LastIngested = DateTime.UtcNow;

        if (existing is null) db.Documents.Add(doc);
        await db.SaveChangesAsync(ct);

        // Generate embeddings in batches
        var texts  = chunker.Chunk(rawText);
        var chunks = 0;

        for (int i = 0; i < texts.Count; i += 50)
        {
            var batch      = texts.Skip(i).Take(50).ToList();
            var embeddings = await embedder.GenerateAsync(
                batch.Select(t => $"Title: {cmd.Title}\n\n{t}").ToList(),
                cancellationToken: ct);

            db.DocumentChunks.AddRange(batch.Select((text, idx) => new DocumentChunk
            {
                TenantId   = tenant.TenantId,
                DocumentId = doc.Id,
                ChunkIndex = i + idx,
                Text       = text,
                Embedding  = new Vector(embeddings[idx].Vector.ToArray()),
            }));

            await db.SaveChangesAsync(ct);
            chunks += batch.Count;
        }

        // Update document count on tenant
        tenantRecord.DocumentCount++;
        await db.SaveChangesAsync(ct);

        logger.LogInformation(
            "Tenant {TenantId}: ingested '{Title}' — {Chunks} chunks", tenant.TenantId, cmd.Title, chunks);

        return IngestResult.Succeeded(doc.Id, chunks);
    }

    private static string ComputeHash(string text)
        => Convert.ToHexString(
            System.Security.Cryptography.SHA256.HashData(
                System.Text.Encoding.UTF8.GetBytes(text)));
}

Phase 3: AI Chat with Token Metering

// Token budget middleware — blocks requests when tenant exceeds monthly budget
public class TokenBudgetMiddleware(
    ITenantContext tenant,
    KnowledgeDbContext db)
    : DelegatingChatClient
{
    public TokenBudgetMiddleware(IChatClient inner, ITenantContext tenant, KnowledgeDbContext db)
        : base(inner) => (this.tenant, this.db) = (tenant, db);

    private readonly ITenantContext tenant = tenant;
    private readonly KnowledgeDbContext db = db;

    public override async Task<ChatCompletion> CompleteAsync(
        IList<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken ct = default)
    {
        var tenantRecord = await db.Tenants.FindAsync([tenant.TenantId], ct);

        if (tenantRecord!.TokensUsedThisMonth >= tenantRecord.MonthlyTokenBudget)
            throw new TokenBudgetExceededException(
                "Monthly AI token budget exceeded. Upgrade your plan or wait for the next billing cycle.");

        var response = await base.CompleteAsync(messages, options, ct);

        // Deduct tokens used
        var tokensUsed = (response.Usage?.InputTokenCount ?? 0) +
                         (response.Usage?.OutputTokenCount ?? 0);

        tenantRecord.TokensUsedThisMonth += tokensUsed;
        await db.SaveChangesAsync(ct);

        return response;
    }
}

// Chat handler with semantic search and token metering
public class ChatQueryHandler(
    KnowledgeDbContext db,
    IEmbeddingGenerator<string, Embedding<float>> embedder,
    IChatClient chatClient,
    ConversationStore conversations,
    ITenantContext tenant)
{
    public async IAsyncEnumerable<string> ChatAsync(
        string sessionId,
        string userMessage,
        [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
    {
        // Retrieve relevant chunks from THIS tenant's documents only
        // (global query filter ensures tenant isolation)
        var queryEmbedding = await embedder.GenerateAsync([userMessage], cancellationToken: ct);
        var queryVector    = new Vector(queryEmbedding[0].Vector.ToArray());

        var chunks = await db.DocumentChunks
            .Include(c => c.Document)
            .Where(c => c.Embedding != null)
            .OrderBy(c => c.Embedding!.CosineDistance(queryVector))
            .Take(6)
            .Where(c => 1.0 - c.Embedding!.CosineDistance(queryVector) > 0.65)
            .Select(c => new { c.Text, c.Document.Title })
            .ToListAsync(ct);

        var history  = await conversations.GetHistoryAsync(sessionId, ct);
        var context  = chunks.Any()
            ? string.Join("\n\n---\n\n", chunks.Select(c => $"[{c.Title}]\n{c.Text}"))
            : null;

        var messages = new List<ChatMessage>
        {
            new(ChatRole.System, """
                You are a knowledge assistant for this team.
                Answer ONLY from the provided documents.
                If no relevant context is provided, say you don't have that information.
                Always cite the document name in your answer.
                """),
        };

        if (context is not null)
            messages.Add(new(ChatRole.System, $"DOCUMENTS:\n{context}"));

        messages.AddRange(history);
        messages.Add(new(ChatRole.User, userMessage));

        var accumulated = new System.Text.StringBuilder();

        await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
        {
            if (update.Text is { Length: > 0 } token)
            {
                accumulated.Append(token);
                yield return token;
            }
        }

        await conversations.AppendAsync(sessionId, "user",      userMessage,             ct);
        await conversations.AppendAsync(sessionId, "assistant", accumulated.ToString(),   ct);
    }
}

Phase 4: MCP Server for Claude Desktop

// Companies can connect Claude Desktop to their KnowledgeOS workspace
[McpServerToolType]
public class KnowledgeMcpTools(
    KnowledgeDbContext db,
    IEmbeddingGenerator<string, Embedding<float>> embedder,
    ITenantContext tenant)
{
    [McpServerTool(Name = "search_knowledge_base")]
    [Description("Search this team's knowledge base for information")]
    public async Task<string> Search(
        [Description("Natural language search query")] string query,
        CancellationToken ct = default)
    {
        var embedding = await embedder.GenerateAsync([query], cancellationToken: ct);
        var vector    = new Vector(embedding[0].Vector.ToArray());

        var chunks = await db.DocumentChunks
            .Include(c => c.Document)
            .Where(c => c.Embedding != null)
            .OrderBy(c => c.Embedding!.CosineDistance(vector))
            .Take(5)
            .Select(c => new { c.Text, c.Document.Title })
            .ToListAsync(ct);

        if (!chunks.Any())
            return "No relevant documents found.";

        return string.Join("\n\n---\n\n",
            chunks.Select(c => $"**{c.Title}**\n{c.Text}"));
    }

    [McpServerTool(Name = "list_documents")]
    [Description("List all documents in the knowledge base")]
    public async Task<string> ListDocuments(CancellationToken ct = default)
    {
        var docs = await db.Documents
            .OrderByDescending(d => d.LastIngested)
            .Select(d => $"- {d.Title} (updated {d.LastIngested:MMM d, yyyy})")
            .ToListAsync(ct);

        return docs.Any()
            ? string.Join("\n", docs)
            : "No documents in the knowledge base yet.";
    }
}

// Register MCP with API key auth
app.MapMcp("/mcp").RequireAuthorization();

Phase 5: CI/CD Pipeline

YAML

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: pgvector/pgvector:pg16
        env:
          POSTGRES_DB: knowledgeos_test
          POSTGRES_PASSWORD: test
        ports: ["5432:5432"]
      redis:
        image: redis:7-alpine
        ports: ["6379:6379"]

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '9.x'

      - name: Restore
        run: dotnet restore

      - name: Build
        run: dotnet build --no-restore

      - name: Test
        env:
          ConnectionStrings__Postgres: "Host=localhost;Database=knowledgeos_test;Username=postgres;Password=test"
          ConnectionStrings__Redis: "localhost:6379"
          OpenAI__ApiKey: ${{ secrets.OPENAI_API_KEY }}
        run: |
          dotnet test --no-build \
            --collect:"XPlat Code Coverage" \
            --results-directory coverage

      - name: Coverage gate
        run: |
          dotnet tool install -g dotnet-reportgenerator-globaltool
          reportgenerator -reports:coverage/**/*.xml -targetdir:report -reporttypes:TextSummary
          COVERAGE=$(grep "Line coverage" report/Summary.txt | grep -oP '\d+\.\d+')
          if (( $(echo "$COVERAGE < 70" | bc -l) )); then
            echo "Coverage $COVERAGE% is below 70% threshold"
            exit 1
          fi

  deploy-staging:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Build and push image
        run: |
          az acr build \
            --registry knowledgeos \
            --image api:${{ github.sha }} \
            --file src/KnowledgeOS.Api/Dockerfile .

      - name: Deploy to staging
        run: |
          az containerapp update \
            --name knowledgeos-api-staging \
            --resource-group knowledgeos-rg \
            --image knowledgeos.azurecr.io/api:${{ github.sha }}

      - name: Health check
        run: |
          sleep 30
          curl -f https://staging.knowledgeos.app/health || exit 1

Phase 6: Production Checklist

Security:
  [ ] API key hashed (SHA-256) before storage — never store raw keys
  [ ] Global query filters on all tenant-scoped tables
  [ ] RLS enabled on PostgreSQL as defence-in-depth
  [ ] Rate limiting on /api/chat (concurrency limiter, not fixed-window)
  [ ] Input validation on file upload (size, extension, content-type)
  [ ] Secrets in Azure Key Vault, not environment variables

Multi-tenancy:
  [ ] Every data table has tenant_id with NOT NULL constraint
  [ ] EF Core global query filter on every tenant-scoped entity
  [ ] RLS policy created for every tenant-scoped table
  [ ] Background jobs scoped per-tenant (no cross-tenant queries)
  [ ] Per-tenant token budget enforced before every AI call

AI quality:
  [ ] Faithfulness evaluation in CI (weekly scheduled run)
  [ ] Retrieval threshold gate (return "I don't know" below 0.65 similarity)
  [ ] System prompt has explicit constraints ("Answer ONLY from context")
  [ ] Document staleness warning for documents over 6 months old
  [ ] Per-tenant eval dashboard showing faithfulness score over time

Observability:
  [ ] Per-tenant cost metrics (tokens in/out, USD cost)
  [ ] Alert when any tenant exceeds 80% of monthly budget
  [ ] Per-request latency tracking (P50, P95, P99)
  [ ] Distributed tracing (Application Insights / OpenTelemetry)
  [ ] Slow query alerting (log_min_duration_statement = 500ms)
  [ ] Weekly cost report email to platform team

Reliability:
  [ ] Polly v8 retry + circuit breaker on all OpenAI API calls
  [ ] Fallback model configured (gpt-4o-mini if gpt-4o fails)
  [ ] Database connection pool health check
  [ ] Ingestion worker restarts automatically on failure
  [ ] Outbox pattern for any AI-triggered side effects (emails, webhooks)

Operations:
  [ ] Terraform/Bicep for all Azure resources
  [ ] Rolling deployments (no downtime) via Container Apps traffic splitting
  [ ] Database migrations run automatically on deploy
  [ ] Structured logging (JSON) shipped to Azure Monitor
  [ ] On-call runbook for common failure modes

Architecture Patterns Used

From this curriculum — all applied in one system:

Pattern                   Where used
─────────────────────────────────────────────────────────
Clean Architecture         Core / Application / Infrastructure / API
CQRS with MediatR          All commands and queries
Global Query Filters       Multi-tenant data isolation
Row-Level Security         PostgreSQL defence-in-depth
HybridCache                Product catalogue, hot reads
Domain Events + Outbox     Post-ingestion webhooks, billing events
Polly v8 Resilience        OpenAI API calls
pgvector + HNSW            Semantic document search
DelegatingChatClient       Token budget middleware
Model routing              gpt-4o-mini for classification, gpt-4o for chat
Faithfulness evaluation    CI quality gate
MCP server                 Claude Desktop integration
Azure Container Apps       Hosting, autoscaling, KEDA
GitHub Actions             CI/CD with coverage gate, staging deploy
Strangler Fig              (if migrating from legacy)

What's Next

You've built everything. The logical next steps for a real production system:

1. Billing integration — Stripe metered billing per token used
2. Team management — invite users, assign roles within a tenant
3. Webhook events — notify external systems when documents are processed
4. Analytics — per-document query count, most-asked questions
5. Fine-tuning — use eval data to improve extraction models for your domain
6. GraphRAG — knowledge graph for multi-hop document relationships
7. Audit log — immutable record of every AI interaction for compliance

Capstone: Build an AI-Powered Multi-Tenant SaaS

Capstone: Build an AI-Powered Multi-Tenant SaaS

System Overview

Phase 1: Multi-Tenant Foundation

Tenant Model

API Key Authentication

Phase 2: Document Ingestion Pipeline

Phase 3: AI Chat with Token Metering

Phase 4: MCP Server for Claude Desktop

Phase 5: CI/CD Pipeline

Phase 6: Production Checklist

Architecture Patterns Used

What's Next

Enjoyed this article?

Leave a comment