AKS in Production — Azure Kubernetes Service Architecture Guide
Design and operate production AKS clusters — node pools, networking (CNI vs kubenet), workload identity, scaling (HPA, KEDA, Cluster Autoscaler), blue-green deployments, monitoring, and cost control.
AKS (Azure Kubernetes Service) is Azure's managed Kubernetes offering. Azure handles the control plane (API server, etcd, scheduler) — you manage the worker nodes, networking, and workload configuration. This guide covers the architecture decisions that determine whether an AKS deployment handles production load gracefully or becomes an operational burden.
Cluster Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ AKS Cluster │
│ │
│ Control Plane (Azure-managed, free) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ API Server │ │ etcd │ │ Scheduler │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ System Node Pool User Node Pools │
│ ┌──────────────────┐ ┌────────────────┐ ┌───────────┐ │
│ │ D4s_v5 × 3 │ │ D8s_v5 × 5 │ │ GPU Pool │ │
│ │ (system pods: │ │ (order-service │ │ (ML work) │ │
│ │ coredns, csi, │ │ api-gw, etc.) │ │ │ │
│ │ metrics-server) │ └────────────────┘ └───────────┘ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────────────┘System vs User Node Pools
Always separate system workloads from application workloads:
// System node pool — runs critical cluster components
resource aksCluster 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
properties: {
agentPoolProfiles: [
{
name: 'system'
mode: 'System' // AKS won't schedule user pods here by default
vmSize: 'Standard_D4s_v5'
count: 3 // min 3 for HA
availabilityZones: ['1', '2', '3'] // zone-redundant
osType: 'Linux'
osDiskType: 'Ephemeral' // faster, included in VM cost
nodeTaints: ['CriticalAddonsOnly=true:NoSchedule'] // prevent app pods
}
{
name: 'appgeneral'
mode: 'User'
vmSize: 'Standard_D8s_v5'
count: 3
minCount: 2
maxCount: 20
enableAutoScaling: true
availabilityZones: ['1', '2', '3']
}
]
}
}Why separate pools:
- System components cannot be evicted by resource pressure from app pods
- Different node sizes per workload type (CPU-optimised, memory-optimised, GPU)
- Independent scaling without disturbing system stability
- Spot node pools for batch workloads (60-90% cost reduction)
Networking: Azure CNI vs Kubenet
Kubenet (default — do not use for production)
Pod IP: 10.244.x.x (not directly routable in VNet)
Traffic: Pod → NAT → Node IP → destination
Problems:
- Pods not directly routable in VNet (breaks Private Endpoints on some paths)
- Complex UDR maintenance for pod routing
- Limited to 400 nodes per clusterAzure CNI (use this)
Each pod gets a VNet IP directly:
Pod IP: 10.1.1.x (real VNet IP, directly routable)
Traffic: Pod → destination (no NAT)
Benefits:
- Pods can be targeted directly from VNet resources
- Private Endpoints work seamlessly
- Network Policies (Calico) work at pod level
- No UDR managementnetworkProfile: {
networkPlugin: 'azure' // Azure CNI
networkPolicy: 'calico' // Calico network policies
serviceCidr: '172.16.0.0/16' // must not overlap VNet
dnsServiceIP: '172.16.0.10'
}Azure CNI Overlay (newer) — pods still get VNet IPs but from a separate CIDR, solving the IP exhaustion problem of classic CNI. Use this for large clusters.
Network Policy: Zero-Trust Between Pods
By default, all pods can talk to all other pods. Calico Network Policies implement pod-level micro-segmentation:
# Allow order-service to call inventory-service only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: inventory-ingress
namespace: production
spec:
podSelector:
matchLabels:
app: inventory-service
policyTypes: [Ingress]
ingress:
- from:
- podSelector:
matchLabels:
app: order-service
ports:
- port: 8080
---
# Default deny all ingress in namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: production
spec:
podSelector: {}
policyTypes: [Ingress, Egress]Start with default-deny per namespace and explicitly allow required paths. This is zero-trust at the pod level.
Workload Identity: No Secrets in Pods
AKS Workload Identity replaces the older pod-identity addon. It federates Kubernetes service accounts with Entra ID — pods authenticate as an Entra ID managed identity without credentials.
Pod → OIDC token from AKS OIDC issuer
→ Entra ID validates token
→ Returns Azure access token
→ Pod calls Key Vault / Service Bus / Storage
→ No credentials anywhere// Enable OIDC issuer and Workload Identity on cluster
properties: {
oidcIssuerProfile: { enabled: true }
securityProfile: {
workloadIdentity: { enabled: true }
}
}# Create Managed Identity
az identity create -n order-service-identity -g rg-prod
# Federate it with the Kubernetes service account
az identity federated-credential create \
--identity-name order-service-identity \
--name order-service-k8s \
--issuer "$(az aks show -n aks-prod -g rg-prod --query oidcIssuerProfile.issuerUrl -o tsv)" \
--subject "system:serviceaccount:production:order-service-sa"# Kubernetes pod spec — annotated service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-service-sa
namespace: production
annotations:
azure.workload.identity/client-id: "<managed-identity-client-id>"
---
apiVersion: apps/v1
kind: Deployment
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true" # inject token
spec:
serviceAccountName: order-service-sa
containers:
- name: order-service
# DefaultAzureCredential picks up the federated token automatically
# No env vars, no secrets, no mounted credentialsScaling Architecture
HPA (Horizontal Pod Autoscaler)
Scale pods based on CPU/memory:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # scale out when avg CPU > 60%
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 512MiKEDA (Event-Driven Autoscaling)
KEDA scales based on external event sources — Service Bus queue depth, Kafka lag, HTTP request rate, etc. Scale to zero when idle.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0 # scale to zero
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: azure-servicebus
metadata:
namespace: mycompany-servicebus
queueName: order-processing
messageCount: "5" # 1 pod per 5 messages
authenticationRef:
name: keda-servicebus-authKEDA scale-to-zero — 0 pods when queue is empty, N pods proportional to queue depth. This is the optimal pattern for async processing: zero cost at idle, maximum throughput at peak.
Cluster Autoscaler
Adds and removes nodes automatically based on pod scheduling pressure:
agentPoolProfiles: [{
enableAutoScaling: true
minCount: 2
maxCount: 20
// Cluster Autoscaler adds a node when pods are Pending (unschedulable)
// Removes a node when utilisation < 50% for 10 minutes
}]Scale-out flow:
- New pods cannot be scheduled (resource pressure or zone balance)
- Cluster Autoscaler detects pending pods
- New node is provisioned (typically 2–4 minutes for AKS)
- Pods are scheduled on new node
Scale-in flow:
- Node utilisation below threshold for
scale-down-delay(default 10 min) - Pods safely drained and rescheduled
- Node removed
For workloads needing faster scale-out, use node pool burst to virtual nodes (ACI) — pods overflow to Azure Container Instances in seconds.
Blue-Green and Canary Deployments
Blue-Green via Deployment Slots (NGINX Ingress)
# Two deployments — only one receives traffic at a time
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: order-service-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "false" # blue = active
spec:
rules:
- host: api.myapp.com
http:
paths:
- path: /api/orders
backend:
service:
name: order-service-blue # switch to green after validation
port: { number: 80 }Canary via NGINX Weight
# Green deployment receives 10% of traffic
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: order-service-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # 10% to canary
spec:
rules:
- host: api.myapp.com
http:
paths:
- path: /api/orders
backend:
service:
name: order-service-green
port: { number: 80 }Monitor error rate and P99 latency on canary traffic. Ramp weight to 50% → 100% if healthy. Roll back by removing the canary ingress.
AKS Monitoring
Container Insights
addonProfiles: {
omsagent: {
enabled: true
config: {
logAnalyticsWorkspaceResourceID: logAnalyticsWorkspace.id
}
}
}Container Insights collects node metrics, pod metrics, container logs, and health data into Log Analytics.
// KQL: Pods with high restart count (crash-looping)
KubePodInventory
| where TimeGenerated > ago(1h)
| where Namespace == 'production'
| where RestartCount > 5
| project PodName, ContainerName, RestartCount, PodStatus
| order by RestartCount desc// KQL: Node CPU pressure
Perf
| where TimeGenerated > ago(30m)
| where ObjectName == "K8SNode" and CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by Computer
| order by AvgCPU descPod Disruption Budgets: Preventing Scale-Down Outages
Without a PodDisruptionBudget (PDB), Cluster Autoscaler can drain a node and take down all replicas of a deployment simultaneously:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: order-service-pdb
spec:
minAvailable: 2 # always keep at least 2 pods running during disruption
selector:
matchLabels:
app: order-serviceAlways define PDBs for production deployments — protects against node drains, cluster upgrades, and Cluster Autoscaler scale-in.
Cost Control in AKS
Spot Node Pools for Batch Workloads
Spot VMs offer 60–90% cost reduction at the cost of potential eviction (2-minute warning):
{
name: 'spot'
mode: 'User'
vmSize: 'Standard_D8s_v5'
scaleSetPriority: 'Spot'
spotMaxPrice: -1 // -1 = up to on-demand price
nodeTaints: ['kubernetes.azure.com/scalesetpriority=spot:NoSchedule']
nodeLabels: { 'kubernetes.azure.com/scalesetpriority': 'spot' }
}# Schedule batch jobs on spot pool only
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
nodeSelector:
kubernetes.azure.com/scalesetpriority: spotResource Requests and Limits
Kubernetes schedules based on requests and throttles at limits. Getting these right prevents both over-provisioning and throttling:
resources:
requests:
cpu: "250m" # what scheduler reserves on the node
memory: "256Mi"
limits:
cpu: "1000m" # max allowed (throttled if exceeded)
memory: "512Mi" # OOMKilled if exceededCommon mistake: setting limits much higher than requests. The node shows low utilisation, Cluster Autoscaler doesn't scale out, pods get scheduled, then all burst simultaneously and throttle.
Rule: start with requests = 70% of observed p95 usage. Set limits = 2× requests. Tune with actual metrics.
Production Checklist
- [ ] System and user node pools separated (system taint applied)
- [ ] Azure CNI (not kubenet) with Calico network policy
- [ ] Workload Identity enabled (no secrets in pods)
- [ ] Zone-redundant node pools (AZs: 1, 2, 3)
- [ ] HPA configured for all stateless services
- [ ] KEDA for event-driven/async processors
- [ ] Cluster Autoscaler with min 2 nodes (never scale to 0 user nodes)
- [ ] PodDisruptionBudgets on all production deployments
- [ ] Resource requests and limits set on every container
- [ ] Container Insights enabled → Log Analytics
- [ ] Spot node pool for batch/non-critical workloads
- [ ] Private cluster (API server not public) for regulated workloads
- [ ] Azure Policy for Kubernetes (Gatekeeper) for policy enforcement
Related: Azure Hub-Spoke Networking — VNet design, Private Endpoints
Related: Azure Well-Architected Framework — reliability, scaling patterns
Related: Azure Cloud Integration — KEDA with Service Bus
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.