IP Exhaustion Crisis in AKS Networking (Kubenet vs. Azure CNI)

Day 2: Solving the IP Exhaustion Crisis in AKS Networking

Welcome back to my Azure Kubernetes Service (AKS) learning journey! Today, I tackled one of the most confusing (and critical) topics in Kubernetes: Networking.

Specifically, I dove into the battle between the two main networking models—Kubenet and Azure CNI—and discovered a "best of both worlds" feature called Azure CNI Overlay.

If you've ever worried about running out of IP addresses in your virtual network, this post is for you.

The Great IP Math Problem

To understand why networking matters, I looked at a practical exercise. Imagine we have a specific Subnet with a CIDR of /24.

A /24 subnet has 256 IP addresses. However, Azure always reserves the first few IPs for management (specifically 5 IPs), leaving us with 251 usable IP addresses.

Let's see how many resources we can deploy in this subnet using the two different models.

1. The Kubenet Approach (The Saver)

In Kubenet:

Nodes get IP addresses from the Azure Subnet.
Pods get IP addresses from a separate, internal "Pod CIDR" (not the subnet).

The Math:

Since only nodes need subnet IPs, we can deploy up to 251 Nodes.

If we multiply that by the default maximum pods per node (110), we can run:

251 nodes x 110 pods = 27,610 pods{alertSuccess}

251 \text{ nodes} \times 110 \text{ pods} = \mathbf{27,610 \text{ pods}}

Kubenet is incredibly efficient with IP space.

2. The Azure CNI Approach (The Spender)

In the traditional Azure CNI (Container Networking Interface):

Nodes get IP addresses from the Azure Subnet.
Pods ALSO get IP addresses from the Azure Subnet.

The Math:

Every single pod eats up a real IP from your VNet. If we use the default limit of 30 pods per node:

We can only deploy about 8 Nodes.
Total pods = 8 nodes × 30 pods = 240 Pods.

We hit an IP exhaustion wall very quickly. We went from potentially 27,000 pods (Kubenet) to just 240 (Azure CNI) in the same subnet size!

So, Which One Should I Choose?

Based on the math above, here is the decision matrix I learned:

Use Kubenet when:

You have limited IP address space.
Most communication happens inside the cluster.
You don't need advanced features like Virtual Nodes or Azure Network Policies.

Use Azure CNI when:

You have plenty of IP addresses available.
Pods need to speak directly to resources outside the cluster (like VMs, on-prem networks, or Azure SQL endpoints) without NAT.
You need advanced features like Virtual Nodes or Azure Network Policies.
You want to avoid managing User Defined Routes (UDRs) manually.

The Solution: Azure CNI Overlay

What if we want the performance and features of Azure CNI, but the IP efficiency of Kubenet?

Enter Azure CNI Overlay.

This was the highlight of my learning today. It's a mode that combines the best of both worlds.

How it works

Nodes take IPs from your Subnet (just like before).
Pods take IPs from a private Overlay Network (a separate CIDR you define during creation).
Each node gets assigned a large slice (a /24) of that overlay network for its pods.

Why is this huge?

No IP Exhaustion: Pods don't consume your VNet IPs.
No Route Tables: Unlike Kubenet, you don't need to mess around with User Defined Routes (UDRs) or Route Tables.
Performance: It promises great connectivity performance without requiring extra hops or encapsulation methods usually needed to tunnel traffic.

Note: Because the overlay network is isolated, external endpoints cannot connect directly to a specific Pod IP. But for most use cases, this is fine.

Hands-On: Creating an Overlay Cluster

I decided to try this out using the Azure CLI. Here is how I spun up a cluster using the Overlay mode.

1. The Command

We need three specific flags:

--network-plugin azure (Use CNI)
--network-plugin-mode overlay (Turn on Overlay)
--pod-cidr (Define the private range for pods)

Bash

# Create Resource Group
az group create -n rg-aks-cni-overlay -l westeurope

# Create AKS Cluster with Overlay
az aks create -n aks-cni-overlay -g rg-aks-cni-overlay \
    --network-plugin azure \
    --network-plugin-mode overlay \
    --pod-cidr 192.168.0.0/16

2. Exploring the Result

Once the cluster was ready, I poked around to see how the IPs were assigned.

Checking Nodes:

kubectl get nodes -o wide

Result: The nodes had IPs from the Azure Subnet range.

Checking Pods:

kubectl get pods -A -o wide

I noticed two patterns here:

Host Network Pods: System pods like kube-proxy or ip-masq-agent used the Node's IP address.
Overlay Pods: Pods like coredns or my nginx application pods used IPs from the 192.168.0.0/16 range I defined earlier.

When I deployed 10 sample Nginx pods, they all received 192.168.x.x addresses, confirming that my VNet IPs were safe and unconsumed!

Key Takeaways

Kubenet is great for saving IPs but requires Route Tables.
Traditional Azure CNI is powerful but eats IPs for breakfast.
Azure CNI Overlay solves the exhaustion problem by keeping Pod IPs internal, while removing the complexity of Route Tables.

It feels like Overlay is going to be the default standard for many clusters going forward.