Kubernetes In-Place Pod Vertical Scaling
Kubernetes continues to evolve, offering features that enhance efficiency and adaptability for developers and operators. Among these are Resize CPU and Memory Resources assigned to Containers, introduced in Kubernetes version 1.27. This feature allows for adjusting the CPU and memory resources of running pods without restarting them, helping to minimize downtime and optimize resource usage. This blog post explores how this feature works, its practical applications, limitations, and cloud provider support. Understanding this functionality is vital for effectively managing containerized workloads and maintaining system reliability.
What Is In-Place Pod Vertical Scaling?
Traditionally, modifying the resource allocation for a Kubernetes pod required a restart, potentially disrupting applications and causing downtime. In-place scaling changes this by enabling real-time CPU and memory adjustments while the pod continues running. This is particularly useful for workloads with a very low tolerance for pod evictions.
What’s behind the feature gate?
The new resizePolicy
spec element allows you to specify how a pod reacts to a patch command that changes its resource requests, enabling changing resource requests without rescheduling the pod.
The result of the change attempt is communicated as part of the pods’ status in a field called resize
(for more information on the new fields, check out the Kubernetes API documentation.)
Additionally, this feature introduces the restartPolicy
in the spec element for containers, allowing fine-grained control over resizing behavior and allowing the developer to choose if CPU change or Memory change should lead to rescheduling the pod.
Key Features
- Dynamic Scaling: Modify CPU and memory allocations while pods run.
- No Restarts: Avoid downtime caused by pod restarts.
- Granular Control: Enable precise resource tuning for better efficiency.
How It Works
The InPlacePodVerticalScaling
feature integrates seamlessly into Kubernetes to provide a more dynamic approach to resource allocation. Here’s a detailed breakdown of how it operates:

- Feature Gate Activation: Activating the
InPlacePodVerticalScaling
feature gate in your cluster configuration is required to enable this functionality. This allows the kubelet on each node to detect and process resource updates dynamically. - Dynamic Resource Updates via Kube API: With the feature enabled, the kubelet directly applies resource changes to running pods without requiring restarts. Supported container runtimes (e.g., containerd v1.6.9 or later) ensure these updates are applied efficiently. If constraints like insufficient free memory or CPU prevent the changes, the pod follows the regular flow: it is recreated and rescheduled.
- Pod Spec Adjustments: The
resizePolicy
field dictates how CPU and memory adjustments are handled. For instance, you can setNotRequired
for live updates without restarts orRestartContainer
to force a restart when a specific resource is modified.
Limitations and Considerations
While In-Place Pod Vertical Scaling offers significant benefits, it has limitations:
1. Cloud Provider Support
- AWS: Not supported by Amazon Elastic Kubernetes Service (EKS) as there is no way to activate the needed feature gate.
- GCP: Google Kubernetes Engine (GKE) supports this feature as an alpha capability, starting with Kubernetes version 1.27. It must be enabled during cluster creation and requires disabling auto-repair and auto-upgrade. See the GKE alpha clusters documentation.
- Azure: Not supported by Azure Kubernetes Service (AKS). Refer to the AKS feature gate discussion.
2. Runtime Compatibility
Container Runtime | Compatible Version | Release Notes |
---|---|---|
containerd | v1.6.9 and above | containerd v1.6.9 Release Notes |
CRI-O | v1.24.2 and above | CRI-O v1.24.2 Release Notes |
Podman | v4.0.0 and above | Podman v4.0.0 Release Notes |
3. Policy Constraints
Several Kubernetes policies and mechanisms govern resource scaling. These include:
Resource Quotas
- Resource quotas limit the total CPU and memory usage for a namespace. If an
InPlacePodVerticalScaling
operation exceeds these limits, the scaling request will fail. For example:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: example-namespace
spec:
hard:
requests.cpu: "10"
requests.memory: "32Gi"
Limit Ranges
- Limit ranges enforce minimum and maximum resource constraints for individual pods or containers within a namespace. The pod will be denied the resource adjustment if a scaling operation exceeds these bounds. Example configuration:
apiVersion: v1
kind: LimitRange
metadata:
name: limits
namespace: example-namespace
spec:
limits:
- type: Container
max:
cpu: "2"
memory: "4Gi"
min:
cpu: "100m"
memory: "128Mi"
Admission Controllers
- Admission controllers, such as Pod Security Admission or custom webhook controllers, can deny scaling operations if they conflict with security or operational policies. For example, a controller may restrict pods from exceeding certain CPU limits.
Application Suitability
- Not all applications can dynamically consume additional resources or adjust to reduced allocations. Examples include:
- Thread Pool Bound applications, like Gunicorn or Unicorn, rely on predefined worker counts.
- Memory-Bound Applications: Applications like Java with fixed Xmx parameters.
HPA
- In cases where the HPA is based on the resource being patched, this can cause an erratic horizontal scaling behavior. For example:
- HPA scaling behavior is based on CPU average utilization
- A pod is changing from 1 core to 2 cores; this can cause a scale-down in pods and affect the bottom-line performance of the application.
- A pod changes from 2 cores to 1; this can cause a scale-up in pods, creating a waste of resources or potential downstream pressure due to the additional and unexpected pods created.
Use Cases
- AI/ML Workloads: Dynamically allocate resources during training and inference phases.
- Traffic Spikes: Combine Horizontal Pod Autoscaler (HPA) with In-Place Pod Vertical Scaling for efficient surge handling.
- Cost Optimization: Reduce waste by allocating the right amount of resources to each pod in real-time.
- Pod StartUp Time: Some applications require significantly higher CPU and memory resources during startup compared to their runtime needs. Google’s example, Startup CPU Boost, demonstrates how dynamic resource scaling can address such scenarios effectively.
Step-by-Step Guide
1. Enable the Feature Gate
Add the following configuration to enable the InPlacePodVerticalScaling
feature:
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
extraArgs:
feature-gates: InPlacePodVerticalScaling=true
controllerManager:
extraArgs:
feature-gates: InPlacePodVerticalScaling=true
scheduler:
extraArgs:
feature-gates: InPlacePodVerticalScaling=true
For GKE, create a cluster with alpha features enabled:
gcloud container clusters create poc \
--enable-kubernetes-alpha \
--no-enable-autorepair \
--no-enable-autoupgrade
2. Deploy a Pod
Define a deployment with initial CPU and memory requests and limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: nginx
image: nginx
resources:
limits:
memory: "128Mi"
cpu: "500m"
requests:
memory: "64Mi"
cpu: "250m"
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: NotRequired
Once deployed, you can check the cpu.weight, cpu.max, memory.max, memory.min
from within the container to see the initial values that the container starts with.
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max
3. Update Resources
Adjust resource allocations for a running pod dynamically:
kubectl patch pod $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -p '{"spec":{"containers":[{"name":"nginx","resources":{"requests":{"cpu":"750m"}}}]}}'
4. Verify Changes
Confirm updated resource settings:
kubectl describe pod -l app=app
Additionally, you can connect to the container and see the change in cpu.weight, cpu.max, memory.max, memory.min
from within the container.
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min
kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max
Conclusion
In-Place Pod Vertical Scaling is a powerful tool for managing dynamic workloads in Kubernetes, reducing downtime, and optimizing resource usage. While its adoption depends on cloud provider support and application compatibility, this feature offers significant efficiency and cost-saving benefits. As Kubernetes evolves, such features will become essential for effective container orchestration.
While Google’s Kube Startup CPU Boost example is just a specific use case scenario, ScaleOps provides an all in one resource management solution to address all needed scenarios related to Kubernetes resource management.