Kubernetes In-Place Pod Vertical Scaling

Kubernetes continues to evolve, offering features that enhance efficiency and adaptability for developers and operators. Among these are Resize CPU and Memory Resources assigned to Containers, introduced in Kubernetes version 1.27. This feature allows for adjusting the CPU and memory resources of running pods without restarting them, helping to minimize downtime and optimize resource usage. This blog post explores how this feature works, its practical applications, limitations, and cloud provider support. Understanding this functionality is vital for effectively managing containerized workloads and maintaining system reliability.

What Is In-Place Pod Vertical Scaling?

Traditionally, modifying the resource allocation for a Kubernetes pod required a restart, potentially disrupting applications and causing downtime. In-place scaling changes this by enabling real-time CPU and memory adjustments while the pod continues running. This is particularly useful for workloads with a very low tolerance for pod evictions.

What’s behind the feature gate?

The new resizePolicy spec element allows you to specify how a pod reacts to a patch command that changes its resource requests, enabling changing resource requests without rescheduling the pod.

The result of the change attempt is communicated as part of the pods’ status in a field called resize (for more information on the new fields, check out the Kubernetes API documentation.)

Additionally, this feature introduces the restartPolicy in the spec element for containers, allowing fine-grained control over resizing behavior and allowing the developer to choose if CPU change or Memory change should lead to rescheduling the pod.

Key Features

Dynamic Scaling: Modify CPU and memory allocations while pods run.
No Restarts: Avoid downtime caused by pod restarts.
Granular Control: Enable precise resource tuning for better efficiency.

How It Works

The InPlacePodVerticalScaling feature integrates seamlessly into Kubernetes to provide a more dynamic approach to resource allocation. Here’s a detailed breakdown of how it operates:

Diagram of Kubernetes in-place resource adjustments showing user interaction with kube-api, kubelet, containerd, and pod within a Kubernetes cluster and node structure.

Feature Gate Activation: Activating the InPlacePodVerticalScaling feature gate in your cluster configuration is required to enable this functionality. This allows the kubelet on each node to detect and process resource updates dynamically.
Dynamic Resource Updates via Kube API: With the feature enabled, the kubelet directly applies resource changes to running pods without requiring restarts. Supported container runtimes (e.g., containerd v1.6.9 or later) ensure these updates are applied efficiently. If constraints like insufficient free memory or CPU prevent the changes, the pod follows the regular flow: it is recreated and rescheduled.
Pod Spec Adjustments: The resizePolicy field dictates how CPU and memory adjustments are handled. For instance, you can set NotRequired for live updates without restarts or RestartContainer to force a restart when a specific resource is modified.

Limitations and Considerations

While In-Place Pod Vertical Scaling offers significant benefits, it has limitations:

1. Cloud Provider Support

AWS: Not supported by Amazon Elastic Kubernetes Service (EKS) as there is no way to activate the needed feature gate.
GCP: Google Kubernetes Engine (GKE) supports this feature as an alpha capability, starting with Kubernetes version 1.27. It must be enabled during cluster creation and requires disabling auto-repair and auto-upgrade. See the GKE alpha clusters documentation.
Azure: Not supported by Azure Kubernetes Service (AKS). Refer to the AKS feature gate discussion.

2. Runtime Compatibility

Container Runtime	Compatible Version	Release Notes
containerd	v1.6.9 and above	containerd v1.6.9 Release Notes
CRI-O	v1.24.2 and above	CRI-O v1.24.2 Release Notes
Podman	v4.0.0 and above	Podman v4.0.0 Release Notes

3. Policy Constraints

Several Kubernetes policies and mechanisms govern resource scaling. These include:

Resource Quotas

Resource quotas limit the total CPU and memory usage for a namespace. If an InPlacePodVerticalScaling operation exceeds these limits, the scaling request will fail. For example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: example-namespace
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "32Gi"

Limit Ranges

Limit ranges enforce minimum and maximum resource constraints for individual pods or containers within a namespace. The pod will be denied the resource adjustment if a scaling operation exceeds these bounds. Example configuration:

apiVersion: v1
kind: LimitRange
metadata:
  name: limits
  namespace: example-namespace
spec:
  limits:
  - type: Container
    max:
      cpu: "2"
      memory: "4Gi"
    min:
      cpu: "100m"
      memory: "128Mi"

Admission Controllers

Admission controllers, such as Pod Security Admission or custom webhook controllers, can deny scaling operations if they conflict with security or operational policies. For example, a controller may restrict pods from exceeding certain CPU limits.

Application Suitability

Not all applications can dynamically consume additional resources or adjust to reduced allocations. Examples include:
- Thread Pool Bound applications, like Gunicorn or Unicorn, rely on predefined worker counts.
- Memory-Bound Applications: Applications like Java with fixed Xmx parameters.

HPA

In cases where the HPA is based on the resource being patched, this can cause an erratic horizontal scaling behavior. For example:
- HPA scaling behavior is based on CPU average utilization

A pod is changing from 1 core to 2 cores; this can cause a scale-down in pods and affect the bottom-line performance of the application.
A pod changes from 2 cores to 1; this can cause a scale-up in pods, creating a waste of resources or potential downstream pressure due to the additional and unexpected pods created.

Use Cases

AI/ML Workloads: Dynamically allocate resources during training and inference phases.
Traffic Spikes: Combine Horizontal Pod Autoscaler (HPA) with In-Place Pod Vertical Scaling for efficient surge handling.
Cost Optimization: Reduce waste by allocating the right amount of resources to each pod in real-time.
Pod StartUp Time: Some applications require significantly higher CPU and memory resources during startup compared to their runtime needs. Google’s example, Startup CPU Boost, demonstrates how dynamic resource scaling can address such scenarios effectively.

Step-by-Step Guide

1. Enable the Feature Gate

Add the following configuration to enable the InPlacePodVerticalScaling feature:

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
controllerManager:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
scheduler:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true

For GKE, create a cluster with alpha features enabled:

gcloud container clusters create poc \
    --enable-kubernetes-alpha \
    --no-enable-autorepair \
    --no-enable-autoupgrade

2. Deploy a Pod

Define a deployment with initial CPU and memory requests and limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
          requests:
            memory: "64Mi"
            cpu: "250m"
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired

Once deployed, you can check the cpu.weight, cpu.max, memory.max, memory.min from within the container to see the initial values that the container starts with.

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max

3. Update Resources

Adjust resource allocations for a running pod dynamically:

kubectl patch pod $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -p '{"spec":{"containers":[{"name":"nginx","resources":{"requests":{"cpu":"750m"}}}]}}'

4. Verify Changes

Confirm updated resource settings:

kubectl describe pod -l app=app

Additionally, you can connect to the container and see the change in cpu.weight, cpu.max, memory.max, memory.min from within the container.

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max

Conclusion

In-Place Pod Vertical Scaling is a powerful tool for managing dynamic workloads in Kubernetes, reducing downtime, and optimizing resource usage. While its adoption depends on cloud provider support and application compatibility, this feature offers significant efficiency and cost-saving benefits. As Kubernetes evolves, such features will become essential for effective container orchestration.

While Google’s Kube Startup CPU Boost example is just a specific use case scenario, ScaleOps provides an all in one resource management solution to address all needed scenarios related to Kubernetes resource management.

Kubernetes In-Place Pod Vertical Scaling

What Is In-Place Pod Vertical Scaling?

How It Works

Limitations and Considerations

Resource Quotas

Limit Ranges

Admission Controllers

Application Suitability

HPA

Use Cases

Step-by-Step Guide

1. Enable the Feature Gate

2. Deploy a Pod

3. Update Resources

4. Verify Changes

Conclusion

Table of contents

Related Articles

Kubernetes Costs: A Guide to Understanding and Controlling Cloud Native Spend

Kubernetes Pricing: A Complete Guide to Understanding Costs and Optimization Strategies

Kubernetes Capacity Planning: Pros, Cons & Best Practices

Start Optimizing K8s Resources in Minutes!

Schedule your demo

Schedule your demo

Proud member of

Available on