DevOps Kubernetes

Kubernetes In-Place Pod Vertical Scaling

Vladislav Shub 16 January 2025 7 min read

Kubernetes continues to evolve, offering features that enhance efficiency and adaptability for developers and operators. Among these are Resize CPU and Memory Resources assigned to Containers, introduced in Kubernetes version 1.27. This feature allows for adjusting the CPU and memory resources of running pods without restarting them, helping to minimize downtime and optimize resource usage. This blog post explores how this feature works, its practical applications, limitations, and cloud provider support. Understanding this functionality is vital for effectively managing containerized workloads and maintaining system reliability.

What Is In-Place Pod Vertical Scaling?

Traditionally, modifying the resource allocation for a Kubernetes pod required a restart, potentially disrupting applications and causing downtime. In-place scaling changes this by enabling real-time CPU and memory adjustments while the pod continues running. This is particularly useful for workloads with a very low tolerance for pod evictions.

What’s behind the feature gate?

The new resizePolicy spec element allows you to specify how a pod reacts to a patch command that changes its resource requests, enabling changing resource requests without rescheduling the pod.

The result of the change attempt is communicated as part of the pods’ status in a field called  resize (for more information on the new fields, check out the Kubernetes API documentation.)

Additionally, this feature introduces the restartPolicy in the spec element for containers, allowing fine-grained control over resizing behavior and allowing the developer to choose if CPU change or Memory change should lead to rescheduling the pod.

Key Features

  • Dynamic Scaling: Modify CPU and memory allocations while pods run.
  • No Restarts: Avoid downtime caused by pod restarts.
  • Granular Control: Enable precise resource tuning for better efficiency.

How It Works

The InPlacePodVerticalScaling feature integrates seamlessly into Kubernetes to provide a more dynamic approach to resource allocation. Here’s a detailed breakdown of how it operates:

Diagram of Kubernetes in-place resource adjustments showing user interaction with kube-api, kubelet, containerd, and pod within a Kubernetes cluster and node structure.
  1. Feature Gate Activation: Activating the InPlacePodVerticalScaling feature gate in your cluster configuration is required to enable this functionality. This allows the kubelet on each node to detect and process resource updates dynamically.
  2. Dynamic Resource Updates via Kube API: With the feature enabled, the kubelet directly applies resource changes to running pods without requiring restarts. Supported container runtimes (e.g., containerd v1.6.9 or later) ensure these updates are applied efficiently. If constraints like insufficient free memory or CPU prevent the changes, the pod follows the regular flow: it is recreated and rescheduled.
  3. Pod Spec Adjustments: The resizePolicy field dictates how CPU and memory adjustments are handled. For instance, you can set NotRequired for live updates without restarts or RestartContainer to force a restart when a specific resource is modified.

Limitations and Considerations

While In-Place Pod Vertical Scaling offers significant benefits, it has limitations:

1. Cloud Provider Support

  • AWS: Not supported by Amazon Elastic Kubernetes Service (EKS) as there is no way to activate the needed feature gate.
  • GCP: Google Kubernetes Engine (GKE) supports this feature as an alpha capability, starting with Kubernetes version 1.27. It must be enabled during cluster creation and requires disabling auto-repair and auto-upgrade. See the GKE alpha clusters documentation.
  • Azure: Not supported by Azure Kubernetes Service (AKS). Refer to the AKS feature gate discussion.

2. Runtime Compatibility

Container RuntimeCompatible VersionRelease Notes
containerdv1.6.9 and abovecontainerd v1.6.9 Release Notes
CRI-Ov1.24.2 and aboveCRI-O v1.24.2 Release Notes
Podmanv4.0.0 and abovePodman v4.0.0 Release Notes

3. Policy Constraints

Several Kubernetes policies and mechanisms govern resource scaling. These include:

Resource Quotas

  • Resource quotas limit the total CPU and memory usage for a namespace. If an InPlacePodVerticalScaling operation exceeds these limits, the scaling request will fail. For example:
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: example-namespace
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "32Gi"

Limit Ranges

  • Limit ranges enforce minimum and maximum resource constraints for individual pods or containers within a namespace. The pod will be denied the resource adjustment if a scaling operation exceeds these bounds. Example configuration:
apiVersion: v1
kind: LimitRange
metadata:
  name: limits
  namespace: example-namespace
spec:
  limits:
  - type: Container
    max:
      cpu: "2"
      memory: "4Gi"
    min:
      cpu: "100m"
      memory: "128Mi"

Admission Controllers

  • Admission controllers, such as Pod Security Admission or custom webhook controllers, can deny scaling operations if they conflict with security or operational policies. For example, a controller may restrict pods from exceeding certain CPU limits.

Application Suitability

  • Not all applications can dynamically consume additional resources or adjust to reduced allocations. Examples include:
    • Thread Pool Bound applications, like Gunicorn or Unicorn, rely on predefined worker counts.
    • Memory-Bound Applications: Applications like Java with fixed Xmx parameters.

HPA

  • In cases where the HPA is based on the resource being patched, this can cause an erratic horizontal scaling behavior. For example:
    • HPA scaling behavior is based on CPU average utilization
  1. A pod is changing from 1 core to 2 cores; this can cause a scale-down in pods and affect the bottom-line performance of the application.
  2. A pod changes from 2 cores to 1; this can cause a scale-up in pods, creating a waste of resources or potential downstream pressure due to the additional and unexpected pods created.

Use Cases

  • AI/ML Workloads: Dynamically allocate resources during training and inference phases.
  • Traffic Spikes: Combine Horizontal Pod Autoscaler (HPA) with In-Place Pod Vertical Scaling for efficient surge handling.
  • Cost Optimization: Reduce waste by allocating the right amount of resources to each pod in real-time.
  • Pod StartUp Time: Some applications require significantly higher CPU and memory resources during startup compared to their runtime needs. Google’s example, Startup CPU Boost, demonstrates how dynamic resource scaling can address such scenarios effectively.

Step-by-Step Guide

1. Enable the Feature Gate

Add the following configuration to enable the InPlacePodVerticalScaling feature:

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
controllerManager:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
scheduler:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true

For GKE, create a cluster with alpha features enabled:

gcloud container clusters create poc \
    --enable-kubernetes-alpha \
    --no-enable-autorepair \
    --no-enable-autoupgrade

2. Deploy a Pod

Define a deployment with initial CPU and memory requests and limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
          requests:
            memory: "64Mi"
            cpu: "250m"
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired

Once deployed, you can check the cpu.weight, cpu.max, memory.max, memory.min from within the container to see the initial values that the container starts with.

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max 

3. Update Resources

Adjust resource allocations for a running pod dynamically:

kubectl patch pod $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -p '{"spec":{"containers":[{"name":"nginx","resources":{"requests":{"cpu":"750m"}}}]}}'

4. Verify Changes

Confirm updated resource settings:

kubectl describe pod -l app=app

Additionally, you can connect to the container and see the change in cpu.weight, cpu.max, memory.max, memory.min from within the container.

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.weight

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/cpu.max

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.min

kubectl exec -it $(kubectl get pods -l app=app -o jsonpath='{.items[*].metadata.name}') -- cat /sys/fs/cgroup/memory.max

Conclusion

In-Place Pod Vertical Scaling is a powerful tool for managing dynamic workloads in Kubernetes, reducing downtime, and optimizing resource usage. While its adoption depends on cloud provider support and application compatibility, this feature offers significant efficiency and cost-saving benefits. As Kubernetes evolves, such features will become essential for effective container orchestration.


While Google’s Kube Startup CPU Boost example is just a specific use case scenario, ScaleOps provides an all in one resource management solution to address all needed scenarios related to Kubernetes resource management.

Related Articles

Kubernetes Workload Rightsizing: Cut Costs & Boost Performance

Kubernetes Workload Rightsizing: Cut Costs & Boost Performance

In the rapidly changing digital environment, Kubernetes has become the go-to platform for managing and scaling applications. However, achieving the ideal balance between performance and cost efficiency remains a challenge. Misconfigured workloads, whether over or under-provisioned, can result in wasted resources, inflated costs, or compromised application performance. Rightsizing Kubernetes workloads is critical to ensuring optimal resource utilization while maintaining seamless application functionality. This guide will cover the core concepts, effective strategies, and essential tools to help you fine-tune your Kubernetes clusters for peak efficiency.

Top 8 Kubernetes Management Tools in 2025

Top 8 Kubernetes Management Tools in 2025

Kubernetes has become the de facto platform for building highly scalable, distributed, and fault-tolerant microservice-based applications. However, its massive ecosystem can overwhelm engineers and lead to bad cluster management practices, resulting in resource waste and unnecessary costs.

Kubernetes Cost Management: Best Practices & Top Tools

Kubernetes Cost Management: Best Practices & Top Tools

Managing Kubernetes costs can be challenging, especially with containers running across multiple clusters and usage constantly fluctuating. Without a clear strategy, unexpected expenses can quickly add up. This article explores the key principles of Kubernetes cost management, highlighting major cost factors, challenges, best practices, and tools to help you maintain efficiency and stay within budget.

Schedule your demo