GKE and the dreaded IP_SPACE_EXHAUSTED error: Understanding the culprit

If you leverage Google Kubernetes Engine (GKE) within your Google Cloud environment, you’ve likely encountered the confidence-shattering “IP_SPACE_EXHAUSTED” error.

It’s a common scenario: you’re convinced your IP address planning is flawless, your subnet design is future-proof, and then suddenly, your GKE cluster unexpectedly hits a scaling roadblock. You start to question your subnetting skills. How can a /24 subnet with a capacity of 252 nodes be exhausted with a mere 64 nodes in your cluster? The answer lies in the nuanced way GKE allocates IP addresses, which often extends beyond simple node count.

In fact, there are three key factors influencing node capacity in GKE. Learn about them, and you’ve gone a long way to avoiding the dreaded IP_SPACE_EXHAUSTED error.

Cluster primary subnet: Provides IP addresses to both your cluster’s nodes and its internal load balancers. Theoretically, the subnet’s size determines your cluster’s maximum scalability, however, there’s more to the story.
Pod IPv4 range: This alias subnet, residing within the primary subnet, provides IP addresses for the pods within your cluster.
Maximum pods per node: This defines the maximum number of pods that GKE can schedule on a single node. While set at the cluster level, it can be overridden at the node-pool level.

GKE’s IP allocation strategy

GKE reserves IP addresses for pods in a clever way. It looks at the “Maximum pods per node” setting and then assigns the smallest subnet that can fit double that number (maximum pods per node) of IP addresses to each node. By having more than twice as many available IP addresses as the maximum number of pods that can be created on a node, Kubernetes reduces IP address reuse while pods are added to and removed from a node. So, if the maximum is set to the default 110 for GKE Standard clusters, GKE finds the smallest subnet mask that can accommodate 220 (2×110) IP addresses (which is /24). It then carves out /24 slices from the pod IPv4 range and assigns them to the nodes.

The ‘aha!’ moment

The key takeaway is that your cluster’s scalability is limited by the number of /24 slices your pod’s IPv4 range can provide, not just the number of IP addresses in your primary subnet. Each node consumes a slice, and once those slices are gone, you hit the “IP_SPACE_EXHAUSTED” error, even if your primary subnet has plenty of addresses left.

Illustrative example

Imagine you created a GKE cluster with these settings:

Cluster Primary Subnet: 10.128.0.0/22
Pod IPv4 Range: 10.0.0.0/18
Maximum pods per node: 110

You confidently proclaimed that your cluster could scale to 1020 nodes. But when it reached 64 nodes, the “IP_SPACE_EXHAUSTED” error struck. Why?

The problem lies in the pod IPv4 range. With a maximum of 110 pods per node, GKE reserves a /24 subnet for each node (2 x 110 = 220 IPs, requiring a /24). A /18 subnet can only be divided into 64 /24 subnets. So, you ran out of pod IP addresses at 64 nodes, even though your primary subnet had ample space.

There are a couple of ways to figure out how many nodes can fit within your pod IPv4 range:

Subnet bit difference: Calculate the difference in subnet masks. Subtracting the pod subnet mask from the node subnet mask (24–18 = 6) gives you the number of ‘subnet bits’ and 2⁶ = 64 possible slices, or nodes, that can fit within the pod IPv4 range.

Total pod capacity: Another way to calculate the maximum number of nodes is by considering the total capacity of the pod IPv4 range. A /18 subnet can accommodate 2(32-18) = 16,384 IP addresses. Since each node, with its /24 subnet, requires 256 addresses, you can divide the total pod capacity by the addresses per node to find the maximum number of nodes: 16,384 / 256 = 64.

Identifying the issue

Let me introduce you to a handy Google Cloud tool called Network Analyzer. It can do a lot, including spotting IP exhaustion issues and giving you a quick overview of your pod IP subnet capacity and how close you are to hitting that limit. If your cluster setup involves a service project for the cluster and a separate host project for the VPC network, you’ll find the relevant Network Analyzer insights within the host project.

The image below shows a Network Analyzer insight for a GKE cluster where a node pool has a /23 pod subnet and allows up to 110 maximum pods per node (effectively using a /24 subnet per node). Attempting to scale this node pool to two nodes triggers a medium priority warning, indicating that you’ve reached the maximum number of nodes supported by the allocated pod IP range.

Fixing the issue

If you’ve hit the limit of the cluster’s primary subnet, you can simply expand it to accommodate more nodes. However, if the bottleneck lies in the pod IPv4 range, you have few options:

Create a new cluster with a larger pod address range: This is easier said than done and my least favorite solution, but in somecases it is necessary.
Adding pod IPv4 address ranges: You can remediate this issue by adding an additional pod IPv4 subnet. Think of it as bringing a new cake to the party — more slices mean you can feed more people (or in this case, provision more nodes). The cluster’s total node capacity then becomes the combined capacity of both the original and the new pod IPv4 ranges.
Maximum pods per node: This setting is immutable at the cluster and node-pool level. However, you do have the flexibility to create a new node pool with a different maximum pods per node value, allowing you to optimize IP address utilization.

GKE Autopilot clusters

Autopilot clusters aren’t immune to pod IP address exhaustion if they weren’t planned carefully. Just like with Standard clusters, you can add additional pod IPv4 subnets to provide more addresses. GKE then uses these additional ranges for pods on nodes created in future node pools.

The less obvious challenge with Autopilot clusters is how to trigger the creation of a new node pool to utilize those additional pod IP ranges. In Autopilot mode, you cannot directly create new node pools. You can force GKE to create a new node pool by deploying a workload using workload separation. This new pool will then tap into the additional pod IPv4 range.

Multiple node pools and varying maximum pods per node

The final piece of the puzzle involves scenarios where multiple node pools share the same pod IPv4 range but have different “maximum pods per node” values. Calculating the maximum number of nodes in such a setup is a bit more complicated.

When multiple node pools share the same pod range, the calculation depends on the number of nodes in each pool. Let’s illustrate this with an example.

Illustrative example

You have a Standard GKE cluster with the following:

Cluster Primary Subnet: 10.128.0.0/22
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 110

The default node pool has these settings:

Name: default-pool
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 110

You then add a second node pool, pool-1, with a lower maximum pods per node:

Name: pool-1
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 60

Based on what we’ve learned, GKE will reserve a /24 subnet per node in the default-pool and a /25 subnet per node in pool-1. Since the /23 pod IPv4 range is shared between them, here are the possible combinations:

Maximum of two nodes in default-pool and zero in pool-1 (Total = 2)
Maximum of one node in default-pool and two in pool-1 (Total = 3)
Maximum of zero nodes in default-pool and four in pool-1 (Total = 4)

As you can see, determining the maximum number of nodes in this cluster isn’t as straightforward as when node pools have separate pod ranges.

Conclusion

Understanding the nuances of GKE’s IP allocation is crucial for avoiding the frustrating “IP_SPACE_EXHAUSTED” error. Plan your subnets and pod ranges carefully, considering the maximum pods per node and the potential for future scaling. With proper planning, you can ensure your GKE clusters have the IP address space they need to grow and thrive. Also, be sure to check out this blog post to understand how you can leverage the class E IPv4 address space to mitigate IPv4 exhaustion issues in GKE.