🎉 ScaleOps is excited to announce $58M in Series B funding led by Lightspeed! Bringing our total funding to $80M! 🎉 Read more →

DevOps Kubernetes

Kubernetes Resource Requests and Limits 101

When it comes to cloud-native services and applications, Kubernetes is de facto the most common container orchestrator. Kubernetes comes with the benefits of built-in deployment, customization, scaling, resiliency capabilities, use-case agnostic and a faster time to market for applications, big data, AI, HPC, and more. However, managing and orchestrating Kubernetes resources efficiently at scale has […]

Yodar Shafrir 4 April 2023 7 min read

When it comes to cloud-native services and applications, Kubernetes is de facto the most common container orchestrator.

Kubernetes comes with the benefits of built-in deployment, customization, scaling, resiliency capabilities, use-case agnostic and a faster time to market for applications, big data, AI, HPC, and more.
However, managing and orchestrating Kubernetes resources efficiently at scale has become the biggest challenge for DevOps teams.

When provisioning compute resources in Kubernetes, it’s important to have a solid understanding of how resources are involved and how they are needed. Some processes require more CPU or memory than others, and some are critical and should never be starved. Knowing this, it’s crucial to properly configure your containers and Pods to get the best performance out of your workloads.

Understanding Requests and Limits in Kubernetes

In Kubernetes, Requests specify the minimum required resources for a container to function. They ensure that nodes allocate necessary resources when scheduling Pods. On the other hand, Limits dictate the maximum amount of resources a container can consume without being terminated or throttled.

Kubernetes Requests: Ensuring Essential Resources

Requests during Scheduling

Kubernetes requests play a critical role during the scheduling process. When allocating Pods to a Node, the scheduler will ensure that the indicated resource requests by the containers in the Pod are satisfied. This way, the Pod is guaranteed to have the minimum resources required to function properly at optimal performance.

Requests during Runtime

The resource request will be guaranteed as the minimum reserved resources for the containers in a Pod. In addition to that, during runtime CPU requests translate to cpu-shares and act as weights. The CPU shares assigned to each container determine the proportion of CPU time that each container can consume. Containers with higher CPU requests receive a higher proportion of CPU time, while containers with lower CPU requests receive a lower proportion of CPU time.

Bad Practices in Kubernetes Requests:

  • Not setting a request at all: Failing to set a CPU and memory request means that the container won’t have a guaranteed minimum amount of resource and may suffer from resource starvation.
  • Over-provisioning: Setting a request that’s significantly higher than the required amount may lead to fluctuating costs, as well as less efficient utilization of available resources.
  • Under-provisioning: Setting a request that’s too low may result in poor latency, Out of memory crashes and noisy neighbors when a few containers are sharing the same node.
  • Not considering other pods on the same node: When setting requests it is important to take into account the other pods running on the same node to avoid resource starvation.
  • Setting requests as a “one time show”: Cloud-native workloads are naturally dynamic, so when requests are not updated and have the initial value may increase costs or negatively affect workload performance.
    It’s very important to continuously analyze container resource consumption and change the requests accordingly.

Managing Kubernetes Limits Effectively

Limits set the maximum resource utilization for a container, preventing performance degradation or resource contention.

Limits are important for ensuring that containers do not consume more resources than required, leading to potential performance degradation or resource starvation for other containers running on the same node.
However, it is important to note that setting ‘Limits’ too low can also result in resource starvation for the container, and may lead to poor performance or even pod termination.

Memory Limits and CPU Limits in Kubernetes function differently.
When a container reaches its Memory Limit, it is terminated, while when a container reaches its CPU Limit, it is throttled, meaning that its processing speed is reduced in order to handle the peak.

Understanding Memory and CPU Limits:

Memory limits terminate containers on reaching the threshold to prevent system instability. Conversely, reaching CPU limits triggers throttling, slowing processing speeds to manage peak demands.

How to Determine Kubernetes Resource Limits

Determining resource limits in Kubernetes involves understanding your application’s resource needs and performance characteristics. Monitoring tools like Prometheus coupled with Kubernetes-native metrics can help assess CPU and memory utilization. Analyze historical usage patterns and anticipated future growth to set realistic limits.

Kubernetes Memory Limit: Impact When Reached

When a container in Kubernetes reaches its memory limit, it faces termination to prevent system instability caused by memory overconsumption. This ensures that other containers on the node aren’t starved of memory resources. Properly setting memory limits prevents out-of-memory errors and keeps the cluster stable.

Kubernetes CPU Limit: Optimizing Processor Usage

Setting CPU limits in Kubernetes involves understanding how much processing power an application requires. CPU limits ensure fair distribution of CPU resources among containers running on a node. However, setting limits too low might throttle the application’s performance, while excessively high limits can impact node performance.

Why Setting CPU Limits May Not Always Be Ideal:

Setting CPU limits may cause CPU throttling and limits the pod’s ability to take advantage of excess CPU resources if they are available. When a few pods run on the same node, they are all guaranteed to get their CPU requests. Using limits may cause the pods to not be able to utilize the full potential of the available resources when required, which may lead to poor performance even when there are available resources on the node.

Pitfalls in Setting Kubernetes Limits:

  • Setting Limits too low causes resource starvation and even termination of the container.
  • Setting Limits too high results in unnecessary resource consumption and potential performance degradation for other containers running on the same node.
  • Not taking into account the resources available on the node where the pod is scheduled and not utilizing idle and available resources.
  • Not taking into consideration the resource usage patterns of the application during configuration.
  • Not updating limits when changing the workload: As workloads change, it is important to update the limits accordingly.

FAQ:

What are Kubernetes resource limits and requests?

  • In Kubernetes, resource limits define the maximum consumption, while requests ensure the minimum required resources for containers.

How to determine Kubernetes resource limits?

  • Use monitoring tools and analyze historical usage patterns to understand CPU and memory needs, setting realistic limits accordingly.

What happens when a Kubernetes memory limit is reached?

  • When a container hits its memory limit, Kubernetes terminates it to prevent node instability due to excessive memory usage.

Why are Kubernetes CPU limits essential?

  • CPU limits ensure fair resource distribution among containers, optimizing node performance

How do Kubernetes requests vs limits differ?

  • Requests ensure minimum resources needed for container functionality, while limits define maximum resource consumption to avoid overutilization.

Why is setting accurate Kubernetes resource limits crucial?

Summary

In today’s dynamic and ever-changing environment, finding the optimal resource requests and limits for each workload in a Kubernetes cluster can be a daunting task.
With constantly changing loads on the cluster and multiple workloads with different owners, it requires a continuous and repeatable process of monitoring and manual configuration of the resource requests and limits. This is a complex process, as every workload has different needs and requires a unique configuration. If not done properly, it can lead to poor performance, unutilized infrastructure, and significantly increased compute costs.

Dynamic Requests and Limits Automation with ScaleOps

At ScaleOps, we understand the challenges of managing resources in a dynamic and ever-changing environment. The ScaleOps platform provides a solution to this problem by continuously and automatically right-sizing pods resources during run-time, with no disruptions or downtime. Our platform eliminates the need for the engineering team to repeatedly and manually configure the resource of workloads, freeing up valuable time and resources.
The bottom line, with ScaleOps, you can reduce your compute costs by up to 80%, effortlessly, while ensuring workload SLA at all times.

With our free trial, in 2 minutes using a simple Helm installation, you will receive full visibility into your potential savings and current workload utilization.

Related Articles

Amazon EKS Auto Mode: What It Is and How to Optimize Kubernetes Clusters

Amazon EKS Auto Mode: What It Is and How to Optimize Kubernetes Clusters

Amazon recently introduced EKS Auto Mode, a feature designed to simplify Kubernetes cluster management. This new feature automates many operational tasks, such as managing cluster infrastructure, provisioning nodes, and optimizing costs. It offers a streamlined experience for developers, allowing them to focus on deploying and running applications without the complexities of cluster management.

Pod Disruption Budget: Benefits, Example & Best Practices

Pod Disruption Budget: Benefits, Example & Best Practices

In Kubernetes, the availability during planned and unplanned disruptions is a critical necessity for systems that require high uptime. Pod Disruption Budgets (PDBs) allow for the management of pod availability during disruptions. With PDBs, one can limit how many pods of an application could be disrupted within a window of time, hence keeping vital services running during node upgrades, scaling, or failure. In this article, we discuss the main components of PDBs, their creation, use, and benefits, along with the best practices for improving them for high availability at the very end.

ScaleOps Pod Placement – Optimizing Unevictable Workloads

ScaleOps Pod Placement – Optimizing Unevictable Workloads

When managing large-scale Kubernetes clusters, efficient resource utilization is key to maintaining application performance while controlling costs. But certain workloads, deemed “unevictable,” can hinder this balance. These pods—restricted by Pod Disruption Budgets (PDBs), safe-to-evict annotations, or their role in core Kubernetes operations—are anchored to nodes, preventing the autoscaler from adjusting resources effectively. The result? Underutilized nodes that drive up costs and compromise scalability. In this blog post, we dive into how unevictable workloads challenge Kubernetes autoscaling and how ScaleOps’ optimized pod placement capabilities bring new efficiency to clusters through intelligent automation.

Schedule your demo