LINQ GroupBy and Aggregates — Summarizing Data Efficiently
GroupBy, Count, Sum, Average, Min, Max in LINQ and EF Core: how they translate to SQL GROUP BY, when to group in memory vs SQL, and the aggregation patterns used in clinical reporting.
GroupBy in LINQ
GroupBy partitions a sequence into groups where each element in a group shares a key. The result is a sequence of IGrouping<TKey, TElement>.
// In-memory GroupBy
var patients = new List<Patient> { /* ... */ };
var byDepartment = patients
.GroupBy(p => p.Department)
.Select(g => new
{
Department = g.Key,
Count = g.Count(),
Patients = g.ToList()
})
.ToList();GroupBy with EF Core → SQL GROUP BY
// EF Core translates GroupBy + aggregate to SQL GROUP BY
var prescriptionsByDrug = await db.Prescriptions
.GroupBy(rx => rx.DrugName)
.Select(g => new DrugSummaryDto(
g.Key,
g.Count(),
g.Sum(rx => rx.Dosage.Amount)))
.OrderByDescending(d => d.Count)
.ToListAsync();
// SQL:
// SELECT DrugName, COUNT(*), SUM(DosageAmount)
// FROM Prescriptions
// GROUP BY DrugName
// ORDER BY COUNT(*) DESCCount, Sum, Min, Max, Average
// Scalar aggregates — execute immediately
int activePatients = await db.Patients.CountAsync(p => p.IsActive);
int totalOrders = await db.DrugOrders.CountAsync();
decimal totalDosage = await db.Prescriptions
.Where(rx => rx.DrugName == "Warfarin" && rx.IsActive)
.SumAsync(rx => rx.Dosage.Amount);
decimal avgINR = await db.INRReadings
.Where(r => r.PatientId == patientId)
.AverageAsync(r => r.Value);
decimal minINR = await db.INRReadings
.Where(r => r.PatientId == patientId)
.MinAsync(r => r.Value);
decimal maxINR = await db.INRReadings
.Where(r => r.PatientId == patientId)
.MaxAsync(r => r.Value);Having Clause — Filtering Groups
// Equivalent to SQL HAVING — filter after grouping
var highVolumePharmacists = await db.Prescriptions
.GroupBy(rx => rx.PrescriberId)
.Where(g => g.Count() > 50) // HAVING COUNT(*) > 50
.Select(g => new
{
PrescriberId = g.Key,
PrescriptionCount = g.Count()
})
.ToListAsync();Multi-Key GroupBy
// Group by multiple fields
var byDepartmentAndMonth = await db.Prescriptions
.GroupBy(rx => new
{
rx.DepartmentId,
Month = rx.IssuedAt.Month,
Year = rx.IssuedAt.Year
})
.Select(g => new
{
g.Key.DepartmentId,
g.Key.Month,
g.Key.Year,
Count = g.Count(),
Total = g.Sum(rx => rx.Dosage.Amount)
})
.OrderBy(x => x.Year).ThenBy(x => x.Month)
.ToListAsync();
// SQL: GROUP BY DepartmentId, MONTH(IssuedAt), YEAR(IssuedAt)Lookup — GroupBy for In-Memory Lookup Tables
// ToLookup creates a pre-grouped in-memory structure
// Use when you already have the data in memory
var prescriptions = await db.Prescriptions.ToListAsync();
ILookup<Guid, Prescription> byPatient =
prescriptions.ToLookup(rx => rx.PatientId);
// Access by key — returns IEnumerable<T>, not null (empty if not found)
var patientPrescriptions = byPatient[patientId];Difference from GroupBy:
GroupByis deferred and works in SQLToLookupis immediate and creates a dictionary-like structure in memory
Aggregate — Custom Aggregation
// Aggregate with seed and accumulator
var summary = prescriptions.Aggregate(
seed: new PrescriptionSummary(),
func: (acc, rx) =>
{
acc.Total += rx.Dosage.Amount;
acc.Count++;
if (rx.Dosage.Amount > acc.MaxDose)
acc.MaxDose = rx.Dosage.Amount;
return acc;
});Dashboard Reporting Pattern
// Single query for a clinical dashboard
public async Task<DashboardDto> GetDashboardAsync(CancellationToken ct)
{
var today = DateOnly.FromDateTime(DateTime.UtcNow);
var [activePatients, todayOrders, overdueOrders, avgWaitMin] =
await Task.WhenAll(
db.Patients.CountAsync(p => p.IsActive, ct),
db.DrugOrders.CountAsync(o => DateOnly.FromDateTime(o.RequestedAt) == today, ct),
db.DrugOrders.CountAsync(o => o.IsOverdue && !o.IsCompleted, ct),
db.DrugOrders
.Where(o => o.CompletedAt.HasValue)
.AverageAsync(o =>
EF.Functions.DateDiffMinute(o.RequestedAt, o.CompletedAt!.Value), ct)
);
return new DashboardDto(activePatients, todayOrders, overdueOrders, (int)avgWaitMin);
}Production issue I've seen: A clinical dashboard ran 12 separate database queries — one for each metric. At peak (8 AM ward rounds), each user loading the dashboard caused 12 DB hits simultaneously. With 50 concurrent users, that was 600 queries in seconds. Combining into
Task.WhenAllwith targeted aggregation queries dropped it to 4 parallel queries.
GroupBy vs ToLookup vs Dictionary
GroupBy (IQueryable):
Best for: filtering, ordering, aggregating against DB
Translates to SQL GROUP BY
Example: "count prescriptions per drug"
GroupBy (IEnumerable in-memory):
Best for: partitioning already-loaded data
Example: split a list by some property
ToLookup:
Best for: in-memory lookup by key (like GroupBy but pre-built)
Immediately evaluated
Empty group returns empty IEnumerable (no null)
Example: "given loaded prescriptions, quickly find all for a patient"
ToDictionary:
Best for: one-to-one lookup (one value per key)
Throws on duplicate keys — use when key is unique
Example: "find a patient by MRN in a pre-loaded list"Red Flag / Green Answer
Red Flag: "We load all prescriptions with ToList() then use LINQ GroupBy in memory to calculate department totals."
If there are 500,000 prescriptions, you load 500,000 rows into memory to count them. The equivalent
GROUP BYquery returns a summary with 20 rows. Memory usage: 500,000 objects vs 20 DTOs.
Green Answer:
Use EF Core's
GroupBywith.Select(g => new { Key = g.Key, Count = g.Count() })before calling.ToListAsync(). The aggregation runs in SQL — only the summary result crosses the network.
Key Takeaway
LINQ's aggregate operators (
Count,Sum,Average,GroupBy) translate directly to SQL when used with EF Core'sIQueryable. Always aggregate in SQL, not in memory — loading rows to count them is an antipattern. UseToLookupwhen you have in-memory data that you will access by key repeatedly. UseToDictionaryfor one-to-one key lookups.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.