Learnixo
Back to blog
System Designadvanced

Case Study: 500 Tenants, One PostgreSQL Database

How a SaaS platform scaled from 20 to 500 tenants on a shared PostgreSQL database: row-level security, per-tenant connection pooling, query cost isolation, and the point where they finally added schema separation.

Asma Hafeez KhanMay 25, 20268 min read
.NETC#PostgreSQLmulti-tenantperformanceSaaSRLSpostmortem
Share:𝕏

Case Study: 500 Tenants, One PostgreSQL Database

System: B2B project management SaaS — 500 business tenants, 85,000 users
Stack: ASP.NET Core 9, EF Core 9, PostgreSQL 16, Azure Database for PostgreSQL
Architecture: Shared database, shared schema (all tenants in the same tables)
Journey duration: 3 years (20 tenants → 500 tenants)
Three inflection points: 50 tenants, 200 tenants, 450 tenants


Starting Architecture (20 Tenants)

All tenants share one database, one schema.
Every table has a TenantId column.
Application code filters by TenantId on every query.

Schema:
  projects (id, tenant_id, name, status, ...)
  tasks    (id, tenant_id, project_id, title, ...)
  users    (id, tenant_id, email, role, ...)
  comments (id, tenant_id, task_id, body, ...)

Row count at 20 tenants:
  projects:  ~4,000 rows
  tasks:     ~40,000 rows
  users:     ~2,000 rows

This worked fine. Zero problems.

Inflection Point 1: 50 Tenants — The Data Leak Bug

At 50 tenants, a developer forgot a .Where(x => x.TenantId == tenantId) on a LINQ query. For 3 hours, one enterprise customer could see another customer's projects.

C#
// The bug — easy to write, hard to notice
public async Task<List<Project>> GetProjectsAsync(int tenantId)
{
    return await db.Projects
        // .Where(p => p.TenantId == tenantId) ← developer forgot this
        .Where(p => p.Status == "Active")
        .ToListAsync();
}

Fix 1: Global Query Filters

C#
// DbContext — apply TenantId filter to every query, automatically
public class AppDbContext(
    DbContextOptions<AppDbContext> opts,
    ITenantContext tenant)
    : DbContext(opts)
{
    protected override void OnModelCreating(ModelBuilder model)
    {
        // Applied to every query on these entities — impossible to forget
        model.Entity<Project>().HasQueryFilter(p => p.TenantId == tenant.TenantId);
        model.Entity<Task>().HasQueryFilter(t => t.TenantId    == tenant.TenantId);
        model.Entity<User>().HasQueryFilter(u => u.TenantId    == tenant.TenantId);
        model.Entity<Comment>().HasQueryFilter(c => c.TenantId == tenant.TenantId);
    }
}

// Register tenant context per-request
public class TenantContext(IHttpContextAccessor http) : ITenantContext
{
    public int TenantId => int.Parse(
        http.HttpContext!.User.FindFirst("tenant_id")!.Value);
}
C#
// Program.cs
builder.Services.AddHttpContextAccessor();
builder.Services.AddScoped<ITenantContext, TenantContext>();
builder.Services.AddDbContext<AppDbContext>();
After this change:
  - Every EF Core query automatically includes WHERE tenant_id = ?
  - Developers cannot forget the filter — it is not optional
  - To deliberately bypass (for admin queries): use .IgnoreQueryFilters()
    which is a conspicuous call, easy to spot in code review

Fix 2: PostgreSQL Row-Level Security (defence-in-depth)

SQL
-- Even if the application sends a query without a tenant filter,
-- PostgreSQL enforces the policy at the database level

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks    ENABLE ROW LEVEL SECURITY;
ALTER TABLE users    ENABLE ROW LEVEL SECURITY;
ALTER TABLE comments ENABLE ROW LEVEL SECURITY;

-- Policy: only rows where tenant_id matches the session setting
CREATE POLICY tenant_isolation ON projects
    USING (tenant_id = current_setting('app.tenant_id')::int);

CREATE POLICY tenant_isolation ON tasks
    USING (tenant_id = current_setting('app.tenant_id')::int);
C#
// Set the session variable on every request
public class TenantRlsInterceptor(ITenantContext tenant)
    : DbCommandInterceptor
{
    public override async ValueTask<DbDataReader> ReaderExecutingAsync(
        DbCommand command,
        CommandEventData eventData,
        InterceptionResult<DbDataReader> result,
        CancellationToken ct = default)
    {
        // Set the PostgreSQL session variable before every command
        await using var setCmd = command.Connection!.CreateCommand();
        setCmd.Transaction = command.Transaction;
        setCmd.CommandText  = $"SET LOCAL app.tenant_id = {tenant.TenantId}";
        await setCmd.ExecuteNonQueryAsync(ct);

        return await base.ReaderExecutingAsync(command, eventData, result, ct);
    }
}

Inflection Point 2: 200 Tenants — The Noisy Neighbour

At 200 tenants, the database started showing P99 latency spikes. Investigation with pg_stat_activity revealed a single enterprise tenant (Tenant #47) running a nightly export job that scanned 2M rows — blocking connections for everyone.

Monitor: pg_stat_statements
SELECT
    calls,
    mean_exec_time,
    total_exec_time,
    (regexp_matches(query, 'tenant_id = (\d+)'))[1] AS tenant_id,
    left(query, 100) AS query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Result:
  Tenant 47 export query: 45 seconds mean exec time, 1,200 calls/day
  Cost: 54,000 seconds of database time per day
  All other tenants combined: ~8,000 seconds

Fix: Per-Tenant Statement Timeout + Connection Isolation

SQL
-- Apply a statement timeout per tenant (stored in tenants table)
-- Default: 30 seconds. Tenant 47 (enterprise): 300 seconds.
UPDATE tenants SET statement_timeout_ms = 300000 WHERE id = 47;
UPDATE tenants SET statement_timeout_ms = 30000  WHERE statement_timeout_ms IS NULL;
C#
// Apply the tenant's timeout on each connection
public class TenantConnectionInterceptor(ITenantContext tenant, TenantSettingsCache settings)
    : DbConnectionInterceptor
{
    public override async Task ConnectionOpenedAsync(
        DbConnection connection,
        ConnectionEndEventData eventData,
        CancellationToken ct = default)
    {
        var cfg = await settings.GetAsync(tenant.TenantId, ct);

        await using var cmd = connection.CreateCommand();
        cmd.CommandText = $"SET statement_timeout = '{cfg.StatementTimeoutMs}';";
        await cmd.ExecuteNonQueryAsync(ct);
    }
}

Fix: Separate Connection Pool for Background Jobs

C#
// Tenant's export jobs go through a separate pool with lower priority
// The main API pool is never affected by long-running exports

// appsettings.json
{
  "ConnectionStrings": {
    "Api":        "Host=db;Database=saas;Username=api_user;Maximum Pool Size=50",
    "Background": "Host=db;Database=saas;Username=bg_user;Maximum Pool Size=10"
  }
}

// bg_user PostgreSQL role has lower priority and a default statement_timeout of 120s
// CREATE ROLE bg_user WITH LOGIN PASSWORD '...' NOSUPERUSER;
// ALTER ROLE bg_user SET statement_timeout = '120s';

Fix: Index Optimisation for Tenant-Specific Queries

SQL
-- Before: index on tenant_id alone  low cardinality, poor selectivity
CREATE INDEX idx_tasks_tenant ON tasks (tenant_id);

-- After: composite index  tenant_id first, then the common filter columns
-- Covers: WHERE tenant_id = 47 AND status = 'Active' AND created_at > '2026-01-01'
CREATE INDEX idx_tasks_tenant_status_created
    ON tasks (tenant_id, status, created_at DESC)
    WHERE deleted_at IS NULL;   -- partial index excludes soft-deleted rows

-- Result: Tenant 47 export dropped from 45s to 2.1s

Inflection Point 3: 450 Tenants — Schema Separation for Enterprise

At 450 tenants, two enterprise clients (each with 5,000+ users) signed contracts requiring:

  1. Data isolation guarantees (auditable proof that their data is separate)
  2. Ability to run their own database backups
  3. Custom retention policies

Shared-schema RLS was not enough for their compliance teams.

Decision: Hybrid Model — Shared Schema for SMB, Separate Schema for Enterprise

Architecture after:
  Shared schema:     tenants 1-448 (SMB customers, default)
  Schema per tenant: tenants 449, 450 (enterprise customers)

Routing logic:
  - Check tenant configuration at startup
  - If enterprise: connect to tenant-specific schema
  - If SMB: connect to shared schema with RLS
C#
// Factory that returns the right DbContext based on tenant tier
public class TenantDbContextFactory(
    IConfiguration config,
    TenantSettingsCache settings,
    IServiceProvider sp)
{
    public async Task<AppDbContext> CreateAsync(int tenantId, CancellationToken ct)
    {
        var tenant = await settings.GetAsync(tenantId, ct);

        if (tenant.Tier == TenantTier.Enterprise)
        {
            // Enterprise: dedicated connection string per tenant
            var connStr = config[$"ConnectionStrings:Tenant_{tenantId}"];
            var opts = new DbContextOptionsBuilder<AppDbContext>()
                .UseNpgsql(connStr)
                .Options;
            return new AppDbContext(opts, sp.GetRequiredService<ITenantContext>());
        }

        // SMB: shared connection string (RLS enforces isolation)
        return sp.GetRequiredService<AppDbContext>();
    }
}

Monitoring Setup That Prevented Further Incidents

C#
// Background job: runs every minute, emits per-tenant metrics
public class TenantDatabaseMonitor(AppDbContext db, IMetrics metrics)
    : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            // Query pg_stat_statements for top-10 tenants by db time
            var tenantCosts = await db.Database
                .SqlQueryRaw<TenantDbCost>("""
                    SELECT
                        (regexp_matches(query, 'tenant_id = (\d+)'))[1]::int AS TenantId,
                        SUM(total_exec_time) AS TotalExecMs,
                        SUM(calls)           AS TotalCalls
                    FROM pg_stat_statements
                    WHERE query LIKE '%tenant_id%'
                    GROUP BY 1
                    ORDER BY 2 DESC
                    LIMIT 10
                    """)
                .ToListAsync(ct);

            foreach (var t in tenantCosts)
            {
                metrics.RecordGauge("db.exec_time_ms", t.TotalExecMs,
                    new Tag("tenant_id", t.TenantId.ToString()));
            }

            await System.Threading.Tasks.Task.Delay(TimeSpan.FromMinutes(1), ct);
        }
    }
}
Alerts configured:
  - Any single tenant consuming >20% of total database time → investigate
  - P99 latency above 2000ms → page on-call
  - Connection pool >80% utilized → add read replica
  - Tenant with zero queries for 30 days → churn risk signal

Results and Architecture Summary

Tenant count:    20 → 500 (25x growth)
Database CPU:    Stays under 40% peak
P99 latency:     Under 800ms (was spiking to 12,000ms at 200 tenants)
Data leak bugs:  Zero since global query filters + RLS (3 years)

What worked:
  1. Global query filters — eliminate entire class of data leak bugs
  2. RLS — defence-in-depth; the database is the last line of defence
  3. Per-tenant statement timeout — noisy neighbours get a shorter leash
  4. Composite indexes (tenant_id, ...) — queries are narrow by design
  5. Hybrid architecture — shared schema for SMB, dedicated for enterprise

What didn't work:
  - Shared pool for API + background jobs — background jobs steal connections
  - Index on tenant_id alone — low cardinality means full table scans
  - Trusting application-level filtering alone — one bug exposes all tenants

When to Move Off a Shared Database

Stay on shared database while:
  - All tenants fit in one database (under 500GB with good indexing)
  - No compliance requirement for physical data separation
  - P99 latency is acceptable with per-tenant timeouts and indexing

Move to per-tenant databases when:
  - An enterprise client requires auditable physical isolation
  - A single tenant's data volume dominates the database (>30% of rows)
  - You need per-tenant backup/restore SLAs
  - Different tenants need different database versions or extensions

Hybrid is the pragmatic answer:
  - SMB tenants: shared schema, low operational cost
  - Enterprise tenants: dedicated schema, premium tier pricing
  - The routing layer adds one config lookup per request — negligible cost

Enjoyed this article?

Explore the System Design learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.