How to Scale Applications in Kubernetes (Manual vs. Automatic)
We’ve focused a lot on building the cluster and making sure the networking is solid. Today, we are finally diving into one of the main reasons people use Kubernetes in the first place: Scalability.
When your application suddenly goes viral, or you get a massive spike in weekend traffic, a single instance of your app is going to crash. Kubernetes offers ways to scale your application out (adding more instances) to handle the load.
Today, I learned about manual scalability and was introduced to the concept of the Horizontal Pod Autoscaler (HPA).
Setting the Stage: Building a Demo Cluster
Before we could test scaling, I needed a cluster and an app. Here is a quick recap of the commands I used to spin up a quick 3-node AKS cluster and connect to it using the Azure CLI:
# 1. Log into Azure
az login
# 2. Check existing clusters
az aks list -o table
# 3. Connect to the cluster (merges credentials into ~/.kube/config)
az aks get-credentials -g aks-demo -n aks-demo
# 4. Verify connection by checking the nodes
kubectl get nodes
Output showed my three aks-agentpool nodes in a Ready state.
Deploying a Sample Application
Next, I deployed a sample PHP-Apache application. This application is specifically designed to consume CPU so we can test scaling later.
Here is the YAML manifest (deploy-svc.yaml) I used. Notice the resources block—this is critical for autoscaling later, as it tells Kubernetes the minimum CPU required (requests) and the maximum allowed (limits).
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1 # Starting with just one pod
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
I applied this file to my cluster:
kubectl apply -f .\deploy-svc.yaml
When I ran kubectl get pods, I saw my single php-apache pod starting up.
Method 1: Manual Scalability
Let's imagine our app is getting hit hard, and one instance isn't enough. We need two instances immediately. I learned there are two ways to do this manually.
Option A: Declarative Way (Updating YAML)
The best practice in Kubernetes is to keep your code as the source of truth. I can simply open the deploy-svc.yaml file, change replicas: 1 to replicas: 2, and re-apply the file:
kubectl apply -f .\deploy-svc.yaml
Kubernetes will see the desired state has changed to 2, and it will spin up a second pod.
Option B: Imperative Way (Command Line)
If I am in a hurry or troubleshooting, I can bypass the YAML file and tell the Kubernetes API directly to scale the deployment using the command line:
kubectl scale --replicas=2 deployment/php-apache
When I checked my pods again:
kubectl get pods
I saw my original pod Running, and a brand new one in the ContainerCreating status!
Monitoring the Load
To see why we might need to scale, we can check the actual CPU and memory utilization of our pods using the top command (note: this requires the Kubernetes Metrics Server to be installed, which AKS has by default).
kubectl top pods
Output:
NAME CPU(cores) MEMORY(bytes)
php-apache-865gdgd6g-4re3 255m 12 Mi
php-apache-865gdgd6g-hnhd 377m 13 Mi
Problem with Manual Scaling
Manual scaling is fine for testing or if you have highly predictable traffic (e.g., "I know we need 10 pods for the Black Friday sale starting at midnight").
But in the real world, traffic is unpredictable. If a blog post goes viral at 3 AM, you aren't going to be awake to run a kubectl scale command. If traffic drops to zero on a Sunday, you don't want to be paying Azure for 10 pods doing nothing.
Solution: Horizontal Pod Autoscaler (HPA)
To solve this, we use Automation.
The Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that constantly watches the metrics of your pods (like the CPU utilization we saw with kubectl top).
You give the HPA a set of rules, such as:
- Minimum Pods: 1
- Maximum Pods: 10
- Target CPU utilization: 50%
If the average CPU across your pods goes above 50%, the HPA automatically talks to the Deployment controller and scales the replicas up. When the load drops back down, it scales the replicas down.
