Learnixo
Back to blog
AI Systemsintermediate

LINQ GroupBy and Aggregates — Summarizing Data Efficiently

GroupBy, Count, Sum, Average, Min, Max in LINQ and EF Core: how they translate to SQL GROUP BY, when to group in memory vs SQL, and the aggregation patterns used in clinical reporting.

Asma Hafeez KhanMay 16, 20265 min read
LINQGroupByAggregatesC#.NETEF Core
Share:𝕏

GroupBy in LINQ

GroupBy partitions a sequence into groups where each element in a group shares a key. The result is a sequence of IGrouping<TKey, TElement>.

C#
// In-memory GroupBy
var patients = new List<Patient> { /* ... */ };

var byDepartment = patients
    .GroupBy(p => p.Department)
    .Select(g => new
    {
        Department = g.Key,
        Count      = g.Count(),
        Patients   = g.ToList()
    })
    .ToList();

GroupBy with EF Core → SQL GROUP BY

C#
// EF Core translates GroupBy + aggregate to SQL GROUP BY
var prescriptionsByDrug = await db.Prescriptions
    .GroupBy(rx => rx.DrugName)
    .Select(g => new DrugSummaryDto(
        g.Key,
        g.Count(),
        g.Sum(rx => rx.Dosage.Amount)))
    .OrderByDescending(d => d.Count)
    .ToListAsync();

// SQL:
// SELECT DrugName, COUNT(*), SUM(DosageAmount)
// FROM Prescriptions
// GROUP BY DrugName
// ORDER BY COUNT(*) DESC

Count, Sum, Min, Max, Average

C#
// Scalar aggregates — execute immediately
int activePatients  = await db.Patients.CountAsync(p => p.IsActive);
int totalOrders     = await db.DrugOrders.CountAsync();

decimal totalDosage = await db.Prescriptions
    .Where(rx => rx.DrugName == "Warfarin" && rx.IsActive)
    .SumAsync(rx => rx.Dosage.Amount);

decimal avgINR      = await db.INRReadings
    .Where(r => r.PatientId == patientId)
    .AverageAsync(r => r.Value);

decimal minINR      = await db.INRReadings
    .Where(r => r.PatientId == patientId)
    .MinAsync(r => r.Value);

decimal maxINR      = await db.INRReadings
    .Where(r => r.PatientId == patientId)
    .MaxAsync(r => r.Value);

Having Clause — Filtering Groups

C#
// Equivalent to SQL HAVING — filter after grouping
var highVolumePharmacists = await db.Prescriptions
    .GroupBy(rx => rx.PrescriberId)
    .Where(g => g.Count() > 50)   // HAVING COUNT(*) > 50
    .Select(g => new
    {
        PrescriberId     = g.Key,
        PrescriptionCount = g.Count()
    })
    .ToListAsync();

Multi-Key GroupBy

C#
// Group by multiple fields
var byDepartmentAndMonth = await db.Prescriptions
    .GroupBy(rx => new
    {
        rx.DepartmentId,
        Month = rx.IssuedAt.Month,
        Year  = rx.IssuedAt.Year
    })
    .Select(g => new
    {
        g.Key.DepartmentId,
        g.Key.Month,
        g.Key.Year,
        Count = g.Count(),
        Total = g.Sum(rx => rx.Dosage.Amount)
    })
    .OrderBy(x => x.Year).ThenBy(x => x.Month)
    .ToListAsync();

// SQL: GROUP BY DepartmentId, MONTH(IssuedAt), YEAR(IssuedAt)

Lookup — GroupBy for In-Memory Lookup Tables

C#
// ToLookup creates a pre-grouped in-memory structure
// Use when you already have the data in memory
var prescriptions = await db.Prescriptions.ToListAsync();

ILookup<Guid, Prescription> byPatient =
    prescriptions.ToLookup(rx => rx.PatientId);

// Access by key — returns IEnumerable<T>, not null (empty if not found)
var patientPrescriptions = byPatient[patientId];

Difference from GroupBy:

  • GroupBy is deferred and works in SQL
  • ToLookup is immediate and creates a dictionary-like structure in memory

Aggregate — Custom Aggregation

C#
// Aggregate with seed and accumulator
var summary = prescriptions.Aggregate(
    seed:  new PrescriptionSummary(),
    func: (acc, rx) =>
    {
        acc.Total += rx.Dosage.Amount;
        acc.Count++;
        if (rx.Dosage.Amount > acc.MaxDose)
            acc.MaxDose = rx.Dosage.Amount;
        return acc;
    });

Dashboard Reporting Pattern

C#
// Single query for a clinical dashboard
public async Task<DashboardDto> GetDashboardAsync(CancellationToken ct)
{
    var today = DateOnly.FromDateTime(DateTime.UtcNow);

    var [activePatients, todayOrders, overdueOrders, avgWaitMin] =
        await Task.WhenAll(
            db.Patients.CountAsync(p => p.IsActive, ct),
            db.DrugOrders.CountAsync(o => DateOnly.FromDateTime(o.RequestedAt) == today, ct),
            db.DrugOrders.CountAsync(o => o.IsOverdue && !o.IsCompleted, ct),
            db.DrugOrders
                .Where(o => o.CompletedAt.HasValue)
                .AverageAsync(o =>
                    EF.Functions.DateDiffMinute(o.RequestedAt, o.CompletedAt!.Value), ct)
        );

    return new DashboardDto(activePatients, todayOrders, overdueOrders, (int)avgWaitMin);
}

Production issue I've seen: A clinical dashboard ran 12 separate database queries — one for each metric. At peak (8 AM ward rounds), each user loading the dashboard caused 12 DB hits simultaneously. With 50 concurrent users, that was 600 queries in seconds. Combining into Task.WhenAll with targeted aggregation queries dropped it to 4 parallel queries.


GroupBy vs ToLookup vs Dictionary

GroupBy (IQueryable):
  Best for: filtering, ordering, aggregating against DB
  Translates to SQL GROUP BY
  Example: "count prescriptions per drug"

GroupBy (IEnumerable in-memory):
  Best for: partitioning already-loaded data
  Example: split a list by some property

ToLookup:
  Best for: in-memory lookup by key (like GroupBy but pre-built)
  Immediately evaluated
  Empty group returns empty IEnumerable (no null)
  Example: "given loaded prescriptions, quickly find all for a patient"

ToDictionary:
  Best for: one-to-one lookup (one value per key)
  Throws on duplicate keys — use when key is unique
  Example: "find a patient by MRN in a pre-loaded list"

Red Flag / Green Answer

Red Flag: "We load all prescriptions with ToList() then use LINQ GroupBy in memory to calculate department totals."

If there are 500,000 prescriptions, you load 500,000 rows into memory to count them. The equivalent GROUP BY query returns a summary with 20 rows. Memory usage: 500,000 objects vs 20 DTOs.

Green Answer:

Use EF Core's GroupBy with .Select(g => new { Key = g.Key, Count = g.Count() }) before calling .ToListAsync(). The aggregation runs in SQL — only the summary result crosses the network.


Key Takeaway

LINQ's aggregate operators (Count, Sum, Average, GroupBy) translate directly to SQL when used with EF Core's IQueryable. Always aggregate in SQL, not in memory — loading rows to count them is an antipattern. Use ToLookup when you have in-memory data that you will access by key repeatedly. Use ToDictionary for one-to-one key lookups.

Enjoyed this article?

Explore the AI Systems learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.