How to Scale Applications in Kubernetes (Manual vs. Automatic)

We’ve focused a lot on building the cluster and making sure the networking is solid. Today, we are finally diving into one of the main reasons people use Kubernetes in the first place: Scalability.

When your application suddenly goes viral, or you get a massive spike in weekend traffic, a single instance of your app is going to crash. Kubernetes offers ways to scale your application out (adding more instances) to handle the load.

Today, I learned about manual scalability and was introduced to the concept of the Horizontal Pod Autoscaler (HPA).

Setting the Stage: Building a Demo Cluster

Before we could test scaling, I needed a cluster and an app. Here is a quick recap of the commands I used to spin up a quick 3-node AKS cluster and connect to it using the Azure CLI:

Bash
# 1. Log into Azure
az login

# 2. Check existing clusters
az aks list -o table

# 3. Connect to the cluster (merges credentials into ~/.kube/config)
az aks get-credentials -g aks-demo -n aks-demo

# 4. Verify connection by checking the nodes
kubectl get nodes

Output showed my three aks-agentpool nodes in a Ready state.

Deploying a Sample Application

Next, I deployed a sample PHP-Apache application. This application is specifically designed to consume CPU so we can test scaling later.

Here is the YAML manifest (deploy-svc.yaml) I used. Notice the resources block—this is critical for autoscaling later, as it tells Kubernetes the minimum CPU required (requests) and the maximum allowed (limits).

YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1  # Starting with just one pod
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

I applied this file to my cluster:

Bash

kubectl apply -f .\deploy-svc.yaml

When I ran kubectl get pods, I saw my single php-apache pod starting up.

Method 1: Manual Scalability

Let's imagine our app is getting hit hard, and one instance isn't enough. We need two instances immediately. I learned there are two ways to do this manually.

Option A: Declarative Way (Updating YAML)

The best practice in Kubernetes is to keep your code as the source of truth. I can simply open the deploy-svc.yaml file, change replicas: 1 to replicas: 2, and re-apply the file:

Bash

kubectl apply -f .\deploy-svc.yaml

Kubernetes will see the desired state has changed to 2, and it will spin up a second pod.

Option B: Imperative Way (Command Line)

If I am in a hurry or troubleshooting, I can bypass the YAML file and tell the Kubernetes API directly to scale the deployment using the command line:

Bash

kubectl scale --replicas=2 deployment/php-apache

When I checked my pods again:

Bash

kubectl get pods

I saw my original pod Running, and a brand new one in the ContainerCreating status!

Monitoring the Load

To see why we might need to scale, we can check the actual CPU and memory utilization of our pods using the top command (note: this requires the Kubernetes Metrics Server to be installed, which AKS has by default).

Bash

kubectl top pods

Output:

Plaintext

NAME                                        CPU(cores)    MEMORY(bytes)
php-apache-865gdgd6g-4re3    255m            12 Mi
php-apache-865gdgd6g-hnhd   377m            13 Mi

Problem with Manual Scaling

Manual scaling is fine for testing or if you have highly predictable traffic (e.g., "I know we need 10 pods for the Black Friday sale starting at midnight").

But in the real world, traffic is unpredictable. If a blog post goes viral at 3 AM, you aren't going to be awake to run a kubectl scale command. If traffic drops to zero on a Sunday, you don't want to be paying Azure for 10 pods doing nothing.

Solution: Horizontal Pod Autoscaler (HPA)

To solve this, we use Automation.

The Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that constantly watches the metrics of your pods (like the CPU utilization we saw with kubectl top).

You give the HPA a set of rules, such as:

Minimum Pods: 1
Maximum Pods: 10
Target CPU utilization: 50%

If the average CPU across your pods goes above 50%, the HPA automatically talks to the Deployment controller and scales the replicas up. When the load drops back down, it scales the replicas down.

How to Scale Applications in Kubernetes

How to Scale Applications in Kubernetes (Manual vs. Automatic)

Setting the Stage: Building a Demo Cluster

Deploying a Sample Application

Method 1: Manual Scalability

Option A: Declarative Way (Updating YAML)

Option B: Imperative Way (Command Line)

Monitoring the Load

Problem with Manual Scaling

Solution: Horizontal Pod Autoscaler (HPA)

Contact Form