Capstone: Build an AI-Powered Multi-Tenant SaaS
The SystemForge capstone project: build a complete AI-powered multi-tenant SaaS from scratch. Covers architecture decisions, all major patterns, and the full production checklist.
Capstone: Build an AI-Powered Multi-Tenant SaaS
This is the SystemForge capstone. You've studied each pattern in isolation. Now you apply all of them together in a single coherent system: a multi-tenant B2B SaaS platform with AI features, built with the production practices that distinguish senior engineers from junior ones.
What you're building: KnowledgeOS ā a B2B knowledge management SaaS where teams can upload documents, ask questions, and get AI-powered answers grounded in their own content.
Tech stack: ASP.NET Core 9, EF Core 9, PostgreSQL 16 + pgvector, Redis, Azure Container Apps, GitHub Actions
System Overview
KnowledgeOS features:
- Multi-tenant: each company is isolated, with their own documents
- Document ingestion: PDF, DOCX, Markdown ā chunked ā embedded ā stored
- AI chat: questions answered from the company's documents only
- Usage metering: token budget per tenant, per-plan limits
- Admin portal: tenant management, usage dashboard, billing hooks
- MCP server: companies can connect Claude Desktop to their knowledge baseArchitecture:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Azure Container Apps ā
ā ā
ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
ā ā API ā ā Ingestion ā ā MCP Server ā ā
ā ā (ASP.NET) ā ā Worker ā ā (SSE) ā ā
ā āāāāāāāā¬āāāāāāāāā āāāāāāāā¬āāāāāāāā āāāāāāāā¬āāāāāāāā ā
ā ā ā ā ā
ā āāāāāāāā¼āāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāā¼āāāāāāāā ā
ā ā PostgreSQL (pgvector) + Redis ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāPhase 1: Multi-Tenant Foundation
Tenant Model
// src/KnowledgeOS.Core/Entities/Tenant.cs
public class Tenant
{
public int Id { get; set; }
public string Name { get; set; } = "";
public string Slug { get; set; } = ""; // URL-safe identifier
public TenantPlan Plan { get; set; } = TenantPlan.Starter;
public bool IsActive { get; set; } = true;
public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
// Usage limits (per plan)
public int MonthlyTokenBudget { get; set; } = 500_000; // 500K tokens/month
public int MaxDocuments { get; set; } = 50;
public int MaxStorageMb { get; set; } = 500;
// Usage tracking (reset monthly)
public int TokensUsedThisMonth { get; set; }
public int DocumentCount { get; set; }
}
public enum TenantPlan { Starter, Professional, Enterprise }// Global query filter ā all entities filtered by tenant automatically
public class KnowledgeDbContext(
DbContextOptions<KnowledgeDbContext> opts,
ITenantContext tenant)
: DbContext(opts)
{
public DbSet<Tenant> Tenants => Set<Tenant>();
public DbSet<Document> Documents => Set<Document>();
public DbSet<DocumentChunk> DocumentChunks => Set<DocumentChunk>();
public DbSet<Conversation> Conversations => Set<Conversation>();
public DbSet<ApiKey> ApiKeys => Set<ApiKey>();
protected override void OnModelCreating(ModelBuilder model)
{
model.HasPostgresExtension("vector");
// Tenant isolation ā applied to all tenant-scoped entities
model.Entity<Document>().HasQueryFilter(
d => d.TenantId == tenant.TenantId);
model.Entity<DocumentChunk>().HasQueryFilter(
c => c.TenantId == tenant.TenantId);
model.Entity<Conversation>().HasQueryFilter(
c => c.TenantId == tenant.TenantId);
model.Entity<DocumentChunk>(e =>
{
e.Property(c => c.Embedding).HasColumnType("vector(1536)");
e.HasIndex(c => c.Embedding)
.HasMethod("hnsw")
.HasOperators("vector_cosine_ops");
});
}
}API Key Authentication
// Tenants authenticate via API keys, not user sessions
public class ApiKeyAuthHandler(
KnowledgeDbContext db,
ILogger<ApiKeyAuthHandler> logger)
: AuthenticationHandler<AuthenticationSchemeOptions>
{
protected override async Task<AuthenticateResult> HandleAuthenticateAsync()
{
if (!Request.Headers.TryGetValue("X-API-Key", out var keyHeader))
return AuthenticateResult.NoResult();
var key = keyHeader.ToString();
var keyHash = Convert.ToHexString(
System.Security.Cryptography.SHA256.HashData(
System.Text.Encoding.UTF8.GetBytes(key)));
// Look up key (ignore tenant filter ā this runs before tenant is set)
var apiKey = await db.ApiKeys.IgnoreQueryFilters()
.Include(k => k.Tenant)
.FirstOrDefaultAsync(k => k.KeyHash == keyHash && k.IsActive);
if (apiKey is null)
{
logger.LogWarning("Invalid API key attempted");
return AuthenticateResult.Fail("Invalid API key");
}
if (!apiKey.Tenant.IsActive)
return AuthenticateResult.Fail("Tenant account is suspended");
var claims = new[]
{
new Claim("tenant_id", apiKey.TenantId.ToString()),
new Claim("key_id", apiKey.Id.ToString()),
new Claim(ClaimTypes.Role, apiKey.Role),
};
var identity = new ClaimsIdentity(claims, Scheme.Name);
var principal = new ClaimsPrincipal(identity);
var ticket = new AuthenticationTicket(principal, Scheme.Name);
return AuthenticateResult.Success(ticket);
}
}Phase 2: Document Ingestion Pipeline
// src/KnowledgeOS.Application/Ingestion/IngestDocumentCommandHandler.cs
public class IngestDocumentCommandHandler(
KnowledgeDbContext db,
DocumentParser parser,
SemanticChunker chunker,
IEmbeddingGenerator<string, Embedding<float>> embedder,
ITenantContext tenant,
ILogger<IngestDocumentCommandHandler> logger)
: IRequestHandler<IngestDocumentCommand, IngestResult>
{
public async Task<IngestResult> Handle(
IngestDocumentCommand cmd, CancellationToken ct)
{
// Enforce plan limits
var tenantRecord = await db.Tenants.FindAsync([tenant.TenantId], ct)!;
if (tenantRecord!.DocumentCount >= tenantRecord.MaxDocuments)
return IngestResult.Rejected(
$"Document limit reached ({tenantRecord.MaxDocuments} max on your plan)");
var fileSize = new FileInfo(cmd.FilePath).Length / (1024 * 1024); // MB
if (tenantRecord.DocumentCount * 10 > tenantRecord.MaxStorageMb) // rough estimate
return IngestResult.Rejected("Storage limit reached");
// Extract and chunk text
var rawText = parser.ExtractText(cmd.FilePath);
var contentHash = ComputeHash(rawText);
// Skip if document unchanged
var existing = await db.Documents.IgnoreQueryFilters()
.FirstOrDefaultAsync(d => d.TenantId == tenant.TenantId
&& d.SourcePath == cmd.FilePath, ct);
if (existing?.ContentHash == contentHash)
return IngestResult.Skipped("Document unchanged");
// Remove old chunks if re-ingesting
if (existing is not null)
{
db.DocumentChunks.RemoveRange(
db.DocumentChunks.Where(c => c.DocumentId == existing.Id));
await db.SaveChangesAsync(ct);
}
var doc = existing ?? new Document
{
TenantId = tenant.TenantId,
CreatedAt = DateTime.UtcNow,
};
doc.Title = cmd.Title;
doc.SourcePath = cmd.FilePath;
doc.ContentHash = contentHash;
doc.LastIngested = DateTime.UtcNow;
if (existing is null) db.Documents.Add(doc);
await db.SaveChangesAsync(ct);
// Generate embeddings in batches
var texts = chunker.Chunk(rawText);
var chunks = 0;
for (int i = 0; i < texts.Count; i += 50)
{
var batch = texts.Skip(i).Take(50).ToList();
var embeddings = await embedder.GenerateAsync(
batch.Select(t => $"Title: {cmd.Title}\n\n{t}").ToList(),
cancellationToken: ct);
db.DocumentChunks.AddRange(batch.Select((text, idx) => new DocumentChunk
{
TenantId = tenant.TenantId,
DocumentId = doc.Id,
ChunkIndex = i + idx,
Text = text,
Embedding = new Vector(embeddings[idx].Vector.ToArray()),
}));
await db.SaveChangesAsync(ct);
chunks += batch.Count;
}
// Update document count on tenant
tenantRecord.DocumentCount++;
await db.SaveChangesAsync(ct);
logger.LogInformation(
"Tenant {TenantId}: ingested '{Title}' ā {Chunks} chunks", tenant.TenantId, cmd.Title, chunks);
return IngestResult.Succeeded(doc.Id, chunks);
}
private static string ComputeHash(string text)
=> Convert.ToHexString(
System.Security.Cryptography.SHA256.HashData(
System.Text.Encoding.UTF8.GetBytes(text)));
}Phase 3: AI Chat with Token Metering
// Token budget middleware ā blocks requests when tenant exceeds monthly budget
public class TokenBudgetMiddleware(
ITenantContext tenant,
KnowledgeDbContext db)
: DelegatingChatClient
{
public TokenBudgetMiddleware(IChatClient inner, ITenantContext tenant, KnowledgeDbContext db)
: base(inner) => (this.tenant, this.db) = (tenant, db);
private readonly ITenantContext tenant = tenant;
private readonly KnowledgeDbContext db = db;
public override async Task<ChatCompletion> CompleteAsync(
IList<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken ct = default)
{
var tenantRecord = await db.Tenants.FindAsync([tenant.TenantId], ct);
if (tenantRecord!.TokensUsedThisMonth >= tenantRecord.MonthlyTokenBudget)
throw new TokenBudgetExceededException(
"Monthly AI token budget exceeded. Upgrade your plan or wait for the next billing cycle.");
var response = await base.CompleteAsync(messages, options, ct);
// Deduct tokens used
var tokensUsed = (response.Usage?.InputTokenCount ?? 0) +
(response.Usage?.OutputTokenCount ?? 0);
tenantRecord.TokensUsedThisMonth += tokensUsed;
await db.SaveChangesAsync(ct);
return response;
}
}// Chat handler with semantic search and token metering
public class ChatQueryHandler(
KnowledgeDbContext db,
IEmbeddingGenerator<string, Embedding<float>> embedder,
IChatClient chatClient,
ConversationStore conversations,
ITenantContext tenant)
{
public async IAsyncEnumerable<string> ChatAsync(
string sessionId,
string userMessage,
[System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
{
// Retrieve relevant chunks from THIS tenant's documents only
// (global query filter ensures tenant isolation)
var queryEmbedding = await embedder.GenerateAsync([userMessage], cancellationToken: ct);
var queryVector = new Vector(queryEmbedding[0].Vector.ToArray());
var chunks = await db.DocumentChunks
.Include(c => c.Document)
.Where(c => c.Embedding != null)
.OrderBy(c => c.Embedding!.CosineDistance(queryVector))
.Take(6)
.Where(c => 1.0 - c.Embedding!.CosineDistance(queryVector) > 0.65)
.Select(c => new { c.Text, c.Document.Title })
.ToListAsync(ct);
var history = await conversations.GetHistoryAsync(sessionId, ct);
var context = chunks.Any()
? string.Join("\n\n---\n\n", chunks.Select(c => $"[{c.Title}]\n{c.Text}"))
: null;
var messages = new List<ChatMessage>
{
new(ChatRole.System, """
You are a knowledge assistant for this team.
Answer ONLY from the provided documents.
If no relevant context is provided, say you don't have that information.
Always cite the document name in your answer.
"""),
};
if (context is not null)
messages.Add(new(ChatRole.System, $"DOCUMENTS:\n{context}"));
messages.AddRange(history);
messages.Add(new(ChatRole.User, userMessage));
var accumulated = new System.Text.StringBuilder();
await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
{
if (update.Text is { Length: > 0 } token)
{
accumulated.Append(token);
yield return token;
}
}
await conversations.AppendAsync(sessionId, "user", userMessage, ct);
await conversations.AppendAsync(sessionId, "assistant", accumulated.ToString(), ct);
}
}Phase 4: MCP Server for Claude Desktop
// Companies can connect Claude Desktop to their KnowledgeOS workspace
[McpServerToolType]
public class KnowledgeMcpTools(
KnowledgeDbContext db,
IEmbeddingGenerator<string, Embedding<float>> embedder,
ITenantContext tenant)
{
[McpServerTool(Name = "search_knowledge_base")]
[Description("Search this team's knowledge base for information")]
public async Task<string> Search(
[Description("Natural language search query")] string query,
CancellationToken ct = default)
{
var embedding = await embedder.GenerateAsync([query], cancellationToken: ct);
var vector = new Vector(embedding[0].Vector.ToArray());
var chunks = await db.DocumentChunks
.Include(c => c.Document)
.Where(c => c.Embedding != null)
.OrderBy(c => c.Embedding!.CosineDistance(vector))
.Take(5)
.Select(c => new { c.Text, c.Document.Title })
.ToListAsync(ct);
if (!chunks.Any())
return "No relevant documents found.";
return string.Join("\n\n---\n\n",
chunks.Select(c => $"**{c.Title}**\n{c.Text}"));
}
[McpServerTool(Name = "list_documents")]
[Description("List all documents in the knowledge base")]
public async Task<string> ListDocuments(CancellationToken ct = default)
{
var docs = await db.Documents
.OrderByDescending(d => d.LastIngested)
.Select(d => $"- {d.Title} (updated {d.LastIngested:MMM d, yyyy})")
.ToListAsync(ct);
return docs.Any()
? string.Join("\n", docs)
: "No documents in the knowledge base yet.";
}
}
// Register MCP with API key auth
app.MapMcp("/mcp").RequireAuthorization();Phase 5: CI/CD Pipeline
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
build-and-test:
runs-on: ubuntu-latest
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: knowledgeos_test
POSTGRES_PASSWORD: test
ports: ["5432:5432"]
redis:
image: redis:7-alpine
ports: ["6379:6379"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '9.x'
- name: Restore
run: dotnet restore
- name: Build
run: dotnet build --no-restore
- name: Test
env:
ConnectionStrings__Postgres: "Host=localhost;Database=knowledgeos_test;Username=postgres;Password=test"
ConnectionStrings__Redis: "localhost:6379"
OpenAI__ApiKey: ${{ secrets.OPENAI_API_KEY }}
run: |
dotnet test --no-build \
--collect:"XPlat Code Coverage" \
--results-directory coverage
- name: Coverage gate
run: |
dotnet tool install -g dotnet-reportgenerator-globaltool
reportgenerator -reports:coverage/**/*.xml -targetdir:report -reporttypes:TextSummary
COVERAGE=$(grep "Line coverage" report/Summary.txt | grep -oP '\d+\.\d+')
if (( $(echo "$COVERAGE < 70" | bc -l) )); then
echo "Coverage $COVERAGE% is below 70% threshold"
exit 1
fi
deploy-staging:
needs: build-and-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: staging
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Build and push image
run: |
az acr build \
--registry knowledgeos \
--image api:${{ github.sha }} \
--file src/KnowledgeOS.Api/Dockerfile .
- name: Deploy to staging
run: |
az containerapp update \
--name knowledgeos-api-staging \
--resource-group knowledgeos-rg \
--image knowledgeos.azurecr.io/api:${{ github.sha }}
- name: Health check
run: |
sleep 30
curl -f https://staging.knowledgeos.app/health || exit 1Phase 6: Production Checklist
Security:
[ ] API key hashed (SHA-256) before storage ā never store raw keys
[ ] Global query filters on all tenant-scoped tables
[ ] RLS enabled on PostgreSQL as defence-in-depth
[ ] Rate limiting on /api/chat (concurrency limiter, not fixed-window)
[ ] Input validation on file upload (size, extension, content-type)
[ ] Secrets in Azure Key Vault, not environment variables
Multi-tenancy:
[ ] Every data table has tenant_id with NOT NULL constraint
[ ] EF Core global query filter on every tenant-scoped entity
[ ] RLS policy created for every tenant-scoped table
[ ] Background jobs scoped per-tenant (no cross-tenant queries)
[ ] Per-tenant token budget enforced before every AI call
AI quality:
[ ] Faithfulness evaluation in CI (weekly scheduled run)
[ ] Retrieval threshold gate (return "I don't know" below 0.65 similarity)
[ ] System prompt has explicit constraints ("Answer ONLY from context")
[ ] Document staleness warning for documents over 6 months old
[ ] Per-tenant eval dashboard showing faithfulness score over time
Observability:
[ ] Per-tenant cost metrics (tokens in/out, USD cost)
[ ] Alert when any tenant exceeds 80% of monthly budget
[ ] Per-request latency tracking (P50, P95, P99)
[ ] Distributed tracing (Application Insights / OpenTelemetry)
[ ] Slow query alerting (log_min_duration_statement = 500ms)
[ ] Weekly cost report email to platform team
Reliability:
[ ] Polly v8 retry + circuit breaker on all OpenAI API calls
[ ] Fallback model configured (gpt-4o-mini if gpt-4o fails)
[ ] Database connection pool health check
[ ] Ingestion worker restarts automatically on failure
[ ] Outbox pattern for any AI-triggered side effects (emails, webhooks)
Operations:
[ ] Terraform/Bicep for all Azure resources
[ ] Rolling deployments (no downtime) via Container Apps traffic splitting
[ ] Database migrations run automatically on deploy
[ ] Structured logging (JSON) shipped to Azure Monitor
[ ] On-call runbook for common failure modesArchitecture Patterns Used
From this curriculum ā all applied in one system:
Pattern Where used
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Clean Architecture Core / Application / Infrastructure / API
CQRS with MediatR All commands and queries
Global Query Filters Multi-tenant data isolation
Row-Level Security PostgreSQL defence-in-depth
HybridCache Product catalogue, hot reads
Domain Events + Outbox Post-ingestion webhooks, billing events
Polly v8 Resilience OpenAI API calls
pgvector + HNSW Semantic document search
DelegatingChatClient Token budget middleware
Model routing gpt-4o-mini for classification, gpt-4o for chat
Faithfulness evaluation CI quality gate
MCP server Claude Desktop integration
Azure Container Apps Hosting, autoscaling, KEDA
GitHub Actions CI/CD with coverage gate, staging deploy
Strangler Fig (if migrating from legacy)What's Next
You've built everything. The logical next steps for a real production system:
1. Billing integration ā Stripe metered billing per token used
2. Team management ā invite users, assign roles within a tenant
3. Webhook events ā notify external systems when documents are processed
4. Analytics ā per-document query count, most-asked questions
5. Fine-tuning ā use eval data to improve extraction models for your domain
6. GraphRAG ā knowledge graph for multi-hop document relationships
7. Audit log ā immutable record of every AI interaction for complianceEnjoyed this article?
Explore the Backend Systems learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.