Platform Engineering: FinOps and Kubernetes Cost Optimization — Karpenter, Kubecost, VPA, and Spot Instances
Deep guide to Kubernetes cost engineering — Karpenter node provisioning with spot instances, Kubecost team-level chargeback, Vertical Pod Autoscaler for resource rightsizing, waste detection automation, and building a FinOps culture in platform engineering.
Why FinOps Belongs in the Platform
Without intentional cost engineering, Kubernetes clusters become expensive fast. The default behavior of most teams: over-request resources (to be safe), never clean up dev workloads, and have no idea what anything costs.
Platform engineers are in the unique position to fix this structurally — because they control:
- How nodes are provisioned (Karpenter)
- What resource requests and limits teams set (VPA recommendations + Kyverno enforcement)
- How costs are attributed (Kubecost labels enforced by policy)
- What developers see (Backstage cost widget)
Cost optimization without platform ownership is a one-time cleanup. Cost optimization with platform ownership is a self-sustaining flywheel.
The Kubernetes Cost Model
Before optimizing, understand what drives K8s costs:
Total cluster cost = Node costs + Managed control plane costs + Storage + Network egress
Node cost breakdown:
├── Compute cost: priced per CPU/hour and RAM/hour
├── Spot discount: 60-80% cheaper, can be reclaimed by cloud provider
└── Reserved vs on-demand: 30-50% savings for stable baseline workloads
What your workloads actually pay for:
├── Requested CPU and memory (even if unused — you pay for reserved capacity)
├── NOT actual usage — if you request 4 CPU and use 0.2, you paid for 4
└── Node overhead: ~10-15% of each node is consumed by the OS and K8s agentsThe rightsizing opportunity: Most Kubernetes workloads use 20-40% of their requested resources. This is the biggest cost reduction lever, and it's purely about correct resource requests.
Karpenter: Right-Sizing Nodes
The Cluster Autoscaler can only scale within pre-defined node groups. If your groups contain c5.2xlarge (8 CPU, 16GB), that's what every scale-up gets — whether your workload needs 1 CPU or 8 CPU.
Karpenter looks at the resource requests of unschedulable Pods and provisions exactly the right instance type:
- Pod needs 1.5 CPU + 3GB RAM → Karpenter picks
t3.mediumorc6i.large(cheapest that fits) - Pod needs 16 CPU + 32GB RAM → Karpenter picks
c6i.4xlarge - Pod needs GPU → Karpenter picks
g5.xlarge
No node groups. No pre-defined pools. Just constraints.
NodePool configuration
# NodePool — defines what Karpenter can provision
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
node-type: general
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
requirements:
# Allow any architecture
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
# Prefer spot, fall back to on-demand
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Broad instance family selection — let Karpenter choose
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c6i", "c6a", "c7i", "m6i", "m6a", "r6i"]
# Maximum node size — prevent overprovisioning
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["nano", "micro", "small"]
limits:
cpu: 1000 # max 1000 CPU across all Karpenter nodes
memory: 4000Gi
disruption:
consolidationPolicy: WhenUnderutilized # key for cost savings
consolidateAfter: 30s # aggressive consolidationEC2NodeClass
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "my-cluster"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "my-cluster"
instanceProfile: KarpenterNodeInstanceProfile
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
encrypted: trueSpot instance handling
The critical consideration: spot instances can be reclaimed by AWS with 2 minutes notice.
# Make workloads spot-tolerant
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
topologySpreadConstraints:
# Spread across availability zones — if one AZ has spot reclamation, not all pods die
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-service
# Graceful shutdown — handle SIGTERM quickly
terminationGracePeriodSeconds: 30
containers:
- name: my-service
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"] # gives LB time to drainSpot fallback strategy in NodePool:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]Karpenter tries spot first, automatically falls back to on-demand if spot is unavailable.
Consolidation: The Hidden Cost Saver
Karpenter consolidation continuously looks for opportunities to move workloads to fewer, cheaper nodes and terminate the rest:
Before consolidation:
Node A (c6i.2xlarge, $0.34/h): 30% CPU used, 3 pods
Node B (c6i.2xlarge, $0.34/h): 20% CPU used, 2 pods
Node C (c6i.xlarge, $0.17/h): 15% CPU used, 1 pod
After consolidation:
Node A (c6i.xlarge, $0.17/h): 65% CPU used, 6 pods
Nodes B and C: terminated → save $0.34/h = $245/monthWith consolidationPolicy: WhenUnderutilized, this happens automatically. Organizations typically see 20-40% cost reduction from consolidation alone.
Kubecost: Team-Level Cost Visibility
Kubecost (open-source via OpenCost) allocates cluster costs to namespaces, workloads, and teams using Kubernetes labels.
Installation
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="" \
--set prometheus.server.persistentVolume.size=32GiCost attribution via labels (enforced by Kyverno)
For Kubecost to allocate costs correctly, every resource needs labels. The platform enforces this:
# Kyverno policy: require cost-attribution labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-labels
spec:
validationFailureAction: Enforce
rules:
- name: check-labels
match:
any:
- resources:
kinds: [Deployment, StatefulSet, DaemonSet, Job, CronJob]
validate:
message: "Resources must have 'team', 'env', and 'cost-center' labels."
pattern:
metadata:
labels:
team: "?*"
env: "?*"
cost-center: "?*"Every deployment without these labels is rejected at admission.
Kubecost allocation API
# Get per-team costs for the last 30 days
curl "http://kubecost.internal/model/allocation?window=30d&aggregate=label:team&accumulate=true"
# Response:
{
"data": {
"team-payments": {
"cpuCost": 245.32,
"ramCost": 87.14,
"pvCost": 34.20,
"networkCost": 12.50,
"totalCost": 379.16
},
"team-orders": {
"totalCost": 521.44
}
}
}Backstage plugin integration
Wire Kubecost into Backstage so each team sees their costs:
// backstage-kubecost-plugin/src/components/CostWidget.tsx
export const CostWidget = ({ entity }: { entity: Entity }) => {
const teamLabel = entity.metadata.annotations?.['backstage.io/team'];
const { data } = useKubecostAllocation({ team: teamLabel, window: '30d' });
return (
<InfoCard title="Cloud Cost (30 days)">
<Typography variant="h4">${data?.totalCost.toFixed(2)}</Typography>
<CostBreakdownChart data={data} />
<Typography variant="body2" color="textSecondary">
CPU: ${data?.cpuCost.toFixed(2)} | Memory: ${data?.ramCost.toFixed(2)}
</Typography>
</InfoCard>
);
};When teams see their own cloud spend in the developer portal, behavior changes. Teams that were requesting 4 CPU and using 0.3 CPU start right-sizing voluntarily once they see the dollar amount.
Cost alerting
# Alert when a team's weekly cost increases >20%
groups:
- name: finops
rules:
- alert: TeamCostSpike
expr: |
(
kubecost_cluster_costs_total{label_team!=""}
- kubecost_cluster_costs_total{label_team!=""} offset 7d
) / kubecost_cluster_costs_total{label_team!=""} offset 7d > 0.20
for: 1h
labels:
severity: warning
annotations:
summary: "Team {{ $labels.label_team }} costs up >20% vs last week"
description: "Current: ${{ $value | humanize }} above last week's baseline"Vertical Pod Autoscaler (VPA): Resource Rightsizing
VPA analyzes historical resource usage and recommends correct CPU/memory requests. It can also apply recommendations automatically.
Installation
git clone https://github.com/kubernetes/autoscaler
./autoscaler/vertical-pod-autoscaler/hack/vpa-up.shVPA in recommendation mode (safest start)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: order-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
updatePolicy:
updateMode: "Off" # recommendation only, don't touch the pods
resourcePolicy:
containerPolicies:
- containerName: order-service
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 4GiCheck recommendations:
kubectl describe vpa order-service-vpa
# Status:
# Recommendation:
# Container Recommendations:
# Container Name: order-service
# Target:
# Cpu: 250m ← suggested request
# Memory: 512Mi
# Lower Bound:
# Cpu: 100m
# Memory: 256Mi
# Upper Bound:
# Cpu: 1000m
# Memory: 1GiThe team requested 2 CPU. VPA says 250m is enough. That's an 8x oversizing — this Pod is paying for 7.75 CPU it never uses.
VPA Auto mode (apply recommendations automatically)
updatePolicy:
updateMode: "Auto" # apply recommendations, evict and restart pods as neededWarning: VPA Auto mode evicts Pods to apply recommendations. This causes brief service disruption. Use minReplicas > 1 and a PodDisruptionBudget to prevent complete outage.
Combining VPA and HPA
VPA adjusts resource requests per pod. HPA adjusts replica count. They conflict when both manage CPU.
Best practice: Use VPA for memory rightsizing (HPA doesn't scale on memory well). Use HPA for CPU-based scaling. Exclude CPU from VPA when HPA is active:
containerPolicies:
- containerName: order-service
controlledResources:
- memory # VPA manages memory only
# CPU managed by HPAWaste Detection: Finding Hidden Costs
Idle workloads
Workloads running with no traffic — old dev deployments, forgotten test services:
# Find deployments with zero requests in the last 7 days
kubectl get deployments -A -o json | jq -r '
.items[] |
select(.spec.replicas > 0) |
[.metadata.namespace, .metadata.name, .spec.replicas] |
@csv
'
# Cross-reference with Prometheus: http_requests_total by serviceAutomate: a daily job queries Prometheus for services with zero HTTP traffic in 30 days, creates a Jira ticket for the owning team, and auto-scales to 0 after 7 days without response.
Oversized PVCs
Persistent volume claims that are < 20% utilized:
# Kubelet exposes PVC usage metrics
kubectl top pv # not built-in, use kubecost or kube-state-metrics
# Prometheus query for underutilized PVCs:
# (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) < 0.2Orphaned resources
Resources with no owner (PVCs from deleted StatefulSets, old ConfigMaps, stale Secrets):
# Find PVCs not mounted by any Pod
kubectl get pvc -A -o json | jq -r '
.items[] |
select(.status.phase == "Bound") |
select(.metadata.ownerReferences == null) |
[.metadata.namespace, .metadata.name, .spec.resources.requests.storage] |
@csv
'The FinOps Flywheel
FinOps on a platform isn't a one-time project — it's a flywheel:
1. Enforce labels (Kyverno) → cost attribution possible
↓
2. Show teams their costs (Kubecost + Backstage) → awareness
↓
3. Teams see they're over-provisioned → motivation to rightsize
↓
4. VPA recommendations available → easy to fix
↓
5. Karpenter consolidates the savings → costs drop
↓
6. Weekly cost report to team leads → accountability
↓
7. Quarterly FinOps review → celebrate savings, identify next wave
↓
back to 3Typical results after 6 months of systematic FinOps:
- 30-45% reduction in cluster compute costs
- 20-30% reduction from spot instance adoption
- 15-25% reduction from rightsizing
- 10-20% reduction from consolidation
Combined: organizations routinely save 40-60% on Kubernetes infrastructure with these tools.
Quick Reference: FinOps Tooling
| Tool | Function | Install | |------|----------|---------| | Karpenter | Node provisioning + consolidation | Helm chart | | Kubecost | Cost allocation and chargeback | Helm chart | | OpenCost | Open-source cost allocation (Kubecost OSS) | Helm chart | | VPA | Resource request recommendations | Shell script | | Goldilocks | VPA recommendations in Backstage/UI | Helm chart | | kube-resource-report | HTML cost report per namespace | Docker | | Popeye | Cluster sanitizer — find waste and config issues | CLI / CronJob |
Recommended stack: Karpenter + Kubecost + VPA (recommendation mode) + Kyverno label enforcement. Add Goldilocks for a developer-friendly VPA UI.
Enjoyed this article?
Explore the Cloud & DevOps learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.