🎉 ScaleOps is excited to announce $58M in Series B funding led by Lightspeed! Bringing our total funding to $80M! 🎉 Read more →

DevOps Kubernetes

Kubernetes VPA: Pros and Cons & Best Practices

Rob Croteau 4 November 2024 10 min read

The Kubernetes Vertical Pod Autoscaler (VPA) is a critical component for managing resource allocation in dynamic containerized environments. This guide explores the benefits, limitations, and best practices of Kubernetes VPA, while offering practical insights for advanced Kubernetes users.

What is Kubernetes VPA?

Kubernetes Vertical Pod Autoscaler (VPA) is a tool designed to optimize resource allocation by dynamically adjusting CPU and memory requests and limits for running pods based on their observed usage. Unlike the Horizontal Pod Autoscaler, which adds or removes pods to handle the load, VPA focuses on resizing the resources of individual pods.

This is particularly useful for workloads that don’t benefit from scaling the number of pods, such as stateful applications or workloads with unpredictable resource needs, ensuring more efficient resource utilization without manual adjustments.

Kubernetes VPA Resource Configuration Types

In Kubernetes autoscaling, you normally configure resources for containers using two key parameters:

  • Requests: The guaranteed amount of CPU and memory resources for a container.
  • Limits: It is the maximum CPU and memory that can be consumed by a container.

Kubernetes VPA dynamically updates such values over time based on the actual usage of resources.

Kubernetes VPA vs HPA

The use cases of the Kubernetes Vertical Pod Autoscaler and Horizontal Pod Autoscaler differ, so the choice depends on the need. Here is a comparison of the two:

FeatureVPA (Vertical Pod Autoscaler)HPA (Horizontal Pod Autoscaler)
Scaling typeAdjusts CPU/memory resources per podAdjusts the number of pods
Best forStateless applications, resource spikesStateless applications, predictable scaling
Metrics usedCPU, memoryCPU, memory, custom metrics (e.g., traffic)
Scaling speedSlower, as pods are recreatedFaster, adds/removes pods quickly
DisruptiveYes, pod restarts requiredNo, pods scale without interruption

Kubernetes VPA Components

Kubernetes VPA consists of three main components that work together to monitor, recommend, and update resource configurations:

1. Recommender

The Recommender continuously monitors the usage of resources and calculates the optimal CPU and memory requests for the containers. The recommendations are based on the actual historical resource consumption by the pods.

2. Updater

The Updater then checks all running pods against the recommendations of the VPA. When it finds any mismatch between the present resource allocation and what is advised, it will evict those pods so that they may be restarted with the updated resource requests.

3. Admission Controller

The Admission Controller catches the pod creation requests and tunes up resource requests and limits according to VPA recommendations. Thus, new pods can start with an appropriate configuration of resources from the very inception.

How Does Kubernetes VPA Work?

Kubernetes VPA analyzes pod resource usage over time. It assesses utilization levels, such as CPU and memory usage, for each pod and recommends or actually imposes an update to the determined recommendations of the Recommender.

When the VPA Updater is also turned on, it resizes pods by evicting them and then re-creating them with fresh resource settings according to the recommendations of the Recommender. This means that under-allocation cannot occur, which would likely cause performance issues, and over-allocation also cannot occur because this wastes resources.

The workflow involves three key steps:

  1. VPA collects historical usage data.
  2. VPA calculates the recommended resource settings based on observed usage patterns.
  3. VPA adjusts the running pods (by recreating them) if needed, applying new resource limits and requests.

Kubernetes VPA Metrics

Kubernetes VPA relies on key metrics to adjust resource allocations dynamically. The accuracy and timeliness of these metrics significantly impact VPA’s effectiveness.

CPU and Memory Metrics

VPA primarily focuses on CPU and memory usage to determine optimal resource configurations for pods. By analyzing these metrics over time, VPA suggests adjustments to minimize underutilization or resource bottlenecks.

Metric Collection Frequency

The frequency of metric collection impacts how VPA detects resource usage trends. A higher collection frequency allows for more precise adjustments but can increase overhead. In most environments, a default collection interval of one minute is sufficient.

Analyzing Resource Spikes

VPA helps manage pods that experience periodic resource spikes by adjusting requests and limits based on historical data. However, VPA’s reaction time may not be sufficient for very short-lived spikes. Therefore, it’s important to analyze resource spikes carefully when configuring VPA to avoid frequent pod evictions.

Historical Data Utilization

The Recommender leverages historical data to make resource recommendations. It’s crucial to have enough historical data to make accurate predictions. This means VPA performs better over time, as more data is collected and analyzed.

How to Configure Kubernetes VPA: Example

This section provides a practical guide to configuring Kubernetes VPA, along with code examples.

Step 1: Install VPA

The first step in using VPA is to install it in your Kubernetes cluster.

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vpa-release-0.9.2/vpa-updater.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vpa-release-0.9.2/vpa-admission-controller.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vpa-release-0.9.2/vpa-recommender.yaml

This installs the VPA components (Updater, Admission Controller, and Recommender) in your cluster.

Step 2: Create a Deployment

Create a sample deployment for which VPA will manage resource allocations. Below is an example YAML configuration for an Nginx deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

Step 3: Create a VPA

After creating the deployment, create a Vertical Pod Autoscaler resource for it.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       nginx-deployment
  updatePolicy:
    updateMode: "Auto"

In this example, the updateMode is set to Auto, meaning the VPA will automatically update resource requests and limits for the pods in the deployment.

Step 4: Testing

Once the VPA is in place, you can monitor the deployment to see how the resource requests and limits are updated. You can use the following command to observe the VPA recommendations:

kubectl get vpa nginx-vpa --output yaml

You can also test resource consumption by applying load to your Nginx deployment and monitoring how VPA adjusts the resources accordingly.

Benefits of Kubernetes VPA

Kubernetes VPA offers several key benefits, as summarized in the table below:

BenefitsDescription
Right-sizingEnsures pods have the appropriate CPU/memory allocation.
Reduced maintenanceAutomatically adjusts resource settings, reducing manual tuning.
Resource optimizationPrevents resource wastage by adjusting underutilized pods.
Improved performanceEnsures high resource availability during high-load situations.
Cost efficiencyMinimizes over-provisioning, reducing infrastructure costs.
Dynamic scalingAdjusts resource allocation based on real-time needs.
Increased uptimeBy optimizing resources, it reduces the chance of pod failures.

Kubernetes VPA Limitations

While Kubernetes VPA offers numerous benefits, it also has limitations that can affect its suitability for certain workloads:

1. Not Policy Driven

VPA makes resource adjustments based solely on observed CPU and memory usage, without considering organizational policies. This can lead to conflicts, especially when resource limits are predefined for cost control or compliance reasons. For example, VPA may increase resource allocations beyond budget limits, potentially causing unexpected costs.

2. No Fast Reaction Support

VPA isn’t designed for fast real-time scaling. It adjusts resources based on historical usage and often requires pod restarts, which aren’t ideal for workloads experiencing sudden, short-lived traffic spikes. For applications requiring immediate scaling (e.g., web servers under heavy traffic), VPA’s reaction time may be too slow. To handle rapid scaling needs, VPA should be combined with Horizontal Pod Autoscaler (HPA), which adjusts pod counts dynamically.

3. Limited Auto Healing

VPA does not provide an autohealing facility. It doesn’t monitor the health of pods and doesn’t respond to a node failure. If a pod crashes because of resource starvation, then VPA will just adjust resources in the next cycle, which may contribute to service delays. 

4. High Adoption Effort

The adoption of VPA for teams less familiar with Kubernetes resource management is a significant effort. Proper configuration, monitoring, and fine-tuning without downtimes are needed.VPA uses pod restarts to apply new settings on top of critical or stateful applications, which might cause issues. Additionally, the learning curve of CPU/memory optimization can pose challenges for new users. Hence, it is wise to test VPA in non-critical environments first.

5. Conflict with HPA on Same Metrics

The use of both Kubernetes VPA and HPA CAN lead to conflict situations when both scale on the same metrics – for example, CPU or memory. VPA scales pods individually by resource allocation, whereas HPA increases the number of pods. For example, VPA increases the allocation of the pod for CPU, whereas HPA still sees the CPU usage at a high level and adds more pods, which leads to over-scaling.

To avoid that, try to apply different metrics for each autoscaler. For example, CPU/Memory for VPA and traffic or custom metrics for HPA, or test their combined usage very carefully.

Kubernetes VPA Best Practices

To maximize the benefits of Kubernetes VPA, it’s important to follow key best practices that ensure optimal performance and resource management:

1. Enable VPA for Critical Workloads

VPA is highly effective for workloads that experience variable resource demands, such as stateful applications or backend systems. Applying VPA to critical workloads ensures optimal resource allocation without manual intervention, preventing resource bottlenecks or wastage.

2. Set Appropriate Resource Requests and Limits

While VPA adjusts resource requests dynamically, setting reasonable initial values for CPU and memory requests and limits is crucial. Properly configured starting points ensure that VPA’s recommendations are aligned with workload requirements and prevent potential performance issues during pod restarts.

3. Monitor Resource Utilization

Even with VPA enabled, continuous monitoring of resource usage is vital. Tools like Prometheus and Grafana provide insights into how VPA is performing. Monitoring helps ensure VPA is making appropriate adjustments, avoids potential bottlenecks, and tracks resource optimization efforts over time.

4. Test the VPA

Before deploying VPA in production, it’s important to test its behavior in a staging environment. Testing under different workload scenarios ensures that VPA adjusts resources appropriately without causing excessive pod restarts or performance degradation.

5. Use in Conjunction with Other Autoscalers

VPA works best when used alongside the Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler. This combination allows both vertical (resource scaling within pods) and horizontal scaling (adjusting the number of pods) to work in harmony, ensuring that both individual pod resources and overall cluster resources are optimized.

6. Consider Application Requirements

Applications with different resource needs require different VPA configurations. For example, stateful or CPU-intensive applications benefit from careful VPA tuning to avoid disruptions caused by frequent pod restarts, while memory-intensive apps can leverage VPA’s dynamic adjustments to prevent out-of-memory errors.

7. Customize VPA Update Policies

Fine-tuning the update policy (Auto, Initial, Off) ensures that VPA updates resources in a way that matches your workload’s sensitivity to pod restarts. For critical or stateful applications, the “Initial” or “Off” modes reduce disruptions while still allowing for resource optimization.

8. Leverage Recommendation Modes

VPA’s recommendation-only mode allows teams to review resource suggestions without enforcing them automatically. This is useful for testing and ensuring that VPA recommendations align with application requirements before they are applied in production environments.

9. Fine-Tune VPA for Stateful Applications

Stateful applications are sensitive to frequent pod evictions. Using the “Initial” update mode and configuring higher initial resource requests for stateful workloads helps maintain stability while optimizing resources.

Conclusion

Kubernetes VPA provides a strong solution for automatically managing the allocation of resources in containerized environments. It improves resource efficiency, reduces maintenance overhead, and can even lead to lower infrastructure costs when properly configured. However, it is equally essential to understand its limitations and, most importantly, how to apply best practices to maximize its effectiveness.

Related Articles

Amazon EKS Auto Mode: What It Is and How to Optimize Kubernetes Clusters

Amazon EKS Auto Mode: What It Is and How to Optimize Kubernetes Clusters

Amazon recently introduced EKS Auto Mode, a feature designed to simplify Kubernetes cluster management. This new feature automates many operational tasks, such as managing cluster infrastructure, provisioning nodes, and optimizing costs. It offers a streamlined experience for developers, allowing them to focus on deploying and running applications without the complexities of cluster management.

Pod Disruption Budget: Benefits, Example & Best Practices

Pod Disruption Budget: Benefits, Example & Best Practices

In Kubernetes, the availability during planned and unplanned disruptions is a critical necessity for systems that require high uptime. Pod Disruption Budgets (PDBs) allow for the management of pod availability during disruptions. With PDBs, one can limit how many pods of an application could be disrupted within a window of time, hence keeping vital services running during node upgrades, scaling, or failure. In this article, we discuss the main components of PDBs, their creation, use, and benefits, along with the best practices for improving them for high availability at the very end.

ScaleOps Pod Placement – Optimizing Unevictable Workloads

ScaleOps Pod Placement – Optimizing Unevictable Workloads

When managing large-scale Kubernetes clusters, efficient resource utilization is key to maintaining application performance while controlling costs. But certain workloads, deemed “unevictable,” can hinder this balance. These pods—restricted by Pod Disruption Budgets (PDBs), safe-to-evict annotations, or their role in core Kubernetes operations—are anchored to nodes, preventing the autoscaler from adjusting resources effectively. The result? Underutilized nodes that drive up costs and compromise scalability. In this blog post, we dive into how unevictable workloads challenge Kubernetes autoscaling and how ScaleOps’ optimized pod placement capabilities bring new efficiency to clusters through intelligent automation.

Schedule your demo