Kubernetes Deep Dive — Production Workloads, Networking & Security
Production Kubernetes guide — core architecture, workload resources (Deployments, StatefulSets, Jobs), networking (Services, Ingress, NetworkPolicy), RBAC, HPA/VPA/KEDA autoscaling, resource management, Helm, secrets management, and production readiness patterns.
Kubernetes is the operating system for distributed systems — it schedules containerised workloads across a cluster, manages their lifecycle, handles failures, and provides primitives for networking, storage, configuration, and secrets. Understanding Kubernetes deeply means understanding its control loop: every component watches state and reconciles it toward the desired state you declare.
Architecture: Control Plane + Data Plane
┌─────────────────────────────────────────────────────────────────┐
│ Control Plane │
│ │
│ kube-apiserver ──── etcd (cluster state, consistent store) │
│ │ │
│ kube-scheduler (assigns Pods to Nodes) │
│ kube-controller-manager (ReplicaSet, Node, Job controllers) │
│ cloud-controller-manager (LoadBalancer, PV provisioning) │
└──────────────────────────────────────────┬──────────────────────┘
│ API
┌──────────────────────────────────────────▼──────────────────────┐
│ Data Plane (Nodes) │
│ │
│ Node 1 Node 2 Node 3 │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ kubelet │ │ kubelet │ (pod lifecycle) │
│ │ kube-proxy │ │ kube-proxy │ (iptables/IPVS) │
│ │ container runtime│ │ container runtime│ (containerd) │
│ │ Pod Pod Pod │ │ Pod Pod │ │
│ └─────────────────┘ └─────────────────┘ │
└──────────────────────────────────────────────────────────────────┘Key insight: Every resource in Kubernetes is stored in etcd as JSON. Controllers watch etcd via the API server for changes and reconcile the actual state toward the desired state. This reconciliation loop is the foundation of Kubernetes' self-healing behaviour.
Core Workload Resources
Deployment — Stateless Applications
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: api-service
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # allow 1 extra pod during update
maxUnavailable: 0 # never go below replica count (zero-downtime)
template:
metadata:
labels:
app: api-service
version: "1.5.2"
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-service:1.5.2
ports:
- containerPort: 8080
resources:
requests: # scheduler uses this for placement
cpu: "250m"
memory: "256Mi"
limits: # hard ceiling — OOMKilled or throttled if exceeded
cpu: "500m"
memory: "512Mi"
readinessProbe: # gates traffic — pod excluded from Service until ready
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
livenessProbe: # restarts pod if unhealthy
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
failureThreshold: 3
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: ConnectionStrings__DefaultConnection
valueFrom:
secretKeyRef:
name: db-credentials
key: connection-string
terminationGracePeriodSeconds: 30 # SIGTERM, wait, then SIGKILLStatefulSet — Stateful Applications
StatefulSets give each Pod a stable network identity (pod-0, pod-1) and stable persistent storage:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres-headless" # requires a headless Service
replicas: 3
selector:
matchLabels:
app: postgres
template:
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates: # each Pod gets its own PVC
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "premium-ssd"
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
spec:
clusterIP: None # headless: DNS returns individual pod IPs
selector:
app: postgres
ports:
- port: 5432Jobs and CronJobs
# One-off database migration job
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration-v2
spec:
backoffLimit: 3 # retry up to 3 times on failure
ttlSecondsAfterFinished: 600
template:
spec:
restartPolicy: OnFailure
containers:
- name: migrator
image: myregistry.azurecr.io/migrator:v2
command: ["dotnet", "ef", "database", "update"]
---
# Scheduled cleanup job
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-old-sessions
spec:
schedule: "0 2 * * *" # 2 AM daily (UTC)
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: myregistry.azurecr.io/cleanup:latest
command: ["python", "cleanup.py", "--days=30"]Networking: Services
# ClusterIP: internal only (default)
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
selector:
app: api-service
ports:
- port: 80
targetPort: 8080
type: ClusterIP
# NodePort: exposed on every node at a static port (dev/testing)
# type: NodePort + nodePort: 30080
# LoadBalancer: cloud-provisioned external LB
apiVersion: v1
kind: Service
metadata:
name: api-service-lb
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true" # AKS: internal LB
spec:
selector:
app: api-service
ports:
- port: 443
targetPort: 8080
type: LoadBalancerIngress — HTTP Routing
Ingress routes external HTTP/S traffic to Services based on host/path rules. Requires an Ingress Controller (NGINX, Traefik, etc.):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls-cert # cert-manager creates this
rules:
- host: api.example.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-v1
port:
number: 80
- path: /v2
pathType: Prefix
backend:
service:
name: api-v2
port:
number: 80NetworkPolicy — Zero-Trust Pod Networking
By default, all pods can communicate with all other pods. NetworkPolicy restricts traffic:
# Default deny all ingress for namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {} # applies to all pods in namespace
policyTypes: [Ingress]
---
# Allow only specific traffic to api-service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-service-ingress
namespace: production
spec:
podSelector:
matchLabels:
app: api-service
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx # ingress controller namespace
- podSelector:
matchLabels:
app: frontend # allow frontend pods in same namespace
ports:
- protocol: TCP
port: 8080RBAC — Role-Based Access Control
# ServiceAccount for an application
apiVersion: v1
kind: ServiceAccount
metadata:
name: api-service-account
namespace: production
annotations:
# AKS Workload Identity: bind to Azure Managed Identity
azure.workload.identity/client-id: "00000000-0000-0000-0000-000000000000"
---
# Role: what permissions are allowed in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: secret-reader
namespace: production
rules:
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["db-credentials", "redis-password"]
verbs: ["get"]
---
# RoleBinding: bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: api-secret-reader
namespace: production
subjects:
- kind: ServiceAccount
name: api-service-account
namespace: production
roleRef:
kind: Role
apiGroupt: rbac.authorization.k8s.io
name: secret-readerPrinciple of least privilege: ServiceAccounts should only have the permissions they need. Use Role/RoleBinding for namespace-scoped access, ClusterRole/ClusterRoleBinding only when cluster-wide access is genuinely required.
Autoscaling
Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale out when avg CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 4 # add max 4 pods per minute
periodSeconds: 60KEDA — Scale on Custom Metrics
KEDA (Kubernetes Event-Driven Autoscaling) scales on queue depth, Kafka lag, HTTP request rate, and 60+ other sources:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0 # scale to zero when idle
maxReplicaCount: 50
triggers:
- type: azure-servicebus
metadata:
connectionFromEnv: SERVICEBUS_CONNECTION
queueName: order-events
messageCount: "5" # 1 replica per 5 messages in queue
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_in_flight
threshold: "100" # 1 replica per 100 in-flight requestsVertical Pod Autoscaler (VPA)
VPA adjusts CPU and memory requests/limits automatically. Run in Off mode first to get recommendations without auto-applying:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # Off = recommendations only, Auto = auto-applyResource Management and QoS
Kubernetes uses requests and limits to classify pods into QoS tiers for eviction priority:
QoS Tier | When | Eviction priority
─────────────────│────────────────────────────────────│──────────────────
BestEffort │ No requests or limits set │ First (evicted first)
Burstable │ Requests < Limits │ Second
Guaranteed │ Requests == Limits (for all containers) │ LastFor production pods, set requests == limits (Guaranteed QoS) on critical services, or at minimum set requests so the scheduler places pods on nodes with sufficient capacity.
# LimitRange: default limits if not specified by pods
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
---
# ResourceQuota: cap total namespace resource usage
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
spec:
hard:
requests.cpu: "50"
requests.memory: "100Gi"
limits.cpu: "100"
limits.memory: "200Gi"
pods: "200"
persistentvolumeclaims: "50"Secrets Management
Kubernetes Secrets are base64-encoded (not encrypted) by default. For production, use external secret managers:
# External Secrets Operator: sync from Azure Key Vault
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: azure-keyvault
namespace: production
spec:
provider:
azurekv:
tenantId: "your-tenant-id"
vaultUrl: "https://myvault.vault.azure.net"
authType: WorkloadIdentity # uses Pod's WorkloadIdentity
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-keyvault
kind: SecretStore
target:
name: db-credentials # creates this K8s Secret
data:
- secretKey: connection-string # K8s Secret key
remoteRef:
key: prod-db-connection-string # Key Vault secret nameHelm — Package Management
Helm packages Kubernetes manifests as reusable charts with values-based templating:
# Add a chart repository
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
# Install with custom values
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--values values-prod.yaml
# Upgrade (rolling update of the chart)
helm upgrade nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--values values-prod.yaml \
--atomic \ # roll back automatically on failure
--timeout 5m
# View release history
helm history nginx-ingress -n ingress-nginx
# Roll back to previous release
helm rollback nginx-ingress 3 -n ingress-nginxCreating a Helm Chart
helm create my-api # generates chart scaffold# my-api/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-api.fullname" . }}
labels:
{{- include "my-api.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
template:
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
resources:
{{- toYaml .Values.resources | nindent 12 }}# my-api/values.yaml
replicaCount: 2
image:
repository: myregistry.azurecr.io/my-api
tag: "1.0.0"
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512MiProduction Readiness Checklist
Pod Disruption Budgets
# Ensure at least 2 replicas available during voluntary disruptions (node drains, upgrades)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # or: maxUnavailable: 1
selector:
matchLabels:
app: api-servicePriority Classes
# High-priority class for critical workloads — protected from preemption
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical-workload
value: 1000000
globalDefault: false
description: "Critical production workloads"
---
# Use in pod spec:
# priorityClassName: critical-workloadProduction Readiness Summary
| Concern | Implementation |
|---------|---------------|
| Health checks | readinessProbe + livenessProbe + optional startupProbe |
| Graceful shutdown | terminationGracePeriodSeconds ≥ request timeout + drain period |
| Resource limits | Always set requests and limits for every container |
| Disruption budget | PodDisruptionBudget for every production Deployment |
| Zero-trust networking | Default-deny NetworkPolicy + explicit allow rules |
| Secrets | External Secrets Operator from Key Vault/Secrets Manager |
| RBAC | Dedicated ServiceAccount per workload, least-privilege roles |
| Autoscaling | HPA for CPU/memory, KEDA for queue-driven, Cluster Autoscaler for nodes |
| Multi-AZ | topologySpreadConstraints across zones |
| Image security | Pin image digests, scan with Trivy in CI, private registry only |
Topology Spread Constraints — Multi-AZ Distribution
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-service
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-serviceObservability
# Prometheus scrape annotations — auto-discovered by Prometheus Operator
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"# kubectl top — requires metrics-server
kubectl top nodes
kubectl top pods -n production --sort-by=memory
# Describe a pod for troubleshooting (events, conditions, probe failures)
kubectl describe pod <pod-name> -n production
# Follow logs across all replicas
kubectl logs -l app=api-service -n production -f --tail=100
# Execute into a running container
kubectl exec -it <pod-name> -n production -- /bin/shRelated: Azure AKS Production Guide — AKS-specific configuration
Related: Azure Cloud Architecture — Well-Architected Framework
Related: Event-Driven Architecture — Kafka and async patterns
Enjoyed this article?
Explore the Cloud & DevOps learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.