Skip to main content

Scaling

Horizontal Pod Autoscaler (HPA)

Scale based on CPU/memory or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 200Mi
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # wait 5m before scaling down
policies:
- type: Percent
value: 20
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15

KEDA (event-driven autoscaling)

Scale on external signals: queue depth, HTTP request rate, Prometheus metrics.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
namespace: production
spec:
scaleTargetRef:
name: worker-deployment
minReplicaCount: 0 # scale to zero when idle
maxReplicaCount: 50
cooldownPeriod: 120
triggers:
- type: rabbitmq
metadata:
queueName: tasks
queueLength: "10" # one replica per 10 messages
authenticationRef:
name: rabbitmq-trigger-auth

Cluster Autoscaler (AWS)

# Annotate the node group
eksctl scale nodegroup \
--cluster my-cluster \
--name workers \
--nodes-min 2 \
--nodes-max 20

Key Cluster Autoscaler flags:

FlagValueNotes
--scale-down-delay-after-add5mWait after scale-up before evaluating scale-down
--scale-down-unneeded-time10mHow long a node must be unneeded before removal
--skip-nodes-with-system-podstrueProtect nodes with kube-system pods
--balance-similar-node-groupstrueKeep AZ balance

Vertical Pod Autoscaler (VPA)

Use in Off mode to get recommendations without auto-applying in production:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: Off # recommendations only; set to Auto with caution
kubectl get vpa api-server-vpa -o jsonpath='{.status.recommendation}' | jq .