Proud Member of

Available on

Kubernetes Resource Requests and Limits 101


When it comes to cloud-native services and applications, Kubernetes is de facto the most common container orchestrator. 

Kubernetes comes with the benefits of built-in deployment, customization, scaling, resiliency capabilities, use-case agnostic and a faster time to market for applications, big data, AI, HPC, and more.
However, managing and orchestrating Kubernetes resources efficiently at scale has become the biggest challenge for DevOps teams.

When provisioning compute resources in Kubernetes, it’s important to have a solid understanding of how resources are involved and how they are needed. Some processes require more CPU or memory than others, and some are critical and should never be starved. Knowing this, it’s crucial to properly configure your containers and Pods to get the best performance out of your workloads.

Kubernetes Requests

‘Requests’ represent the minimum guaranteed amount of a resource (CPU/Mem) that is reserved for a container.

Requests during Scheduling

Kubernetes requests play a critical role during the scheduling process. When allocating Pods to a Node, the scheduler will ensure that the indicated resource requests by the containers in the Pod are satisfied. This way, the Pod is guaranteed to have the minimum resources required to function properly at optimal performance.

Requests during Runtime

The resource request will be guaranteed as the minimum reserved resources for the containers in a Pod. In addition to that, during runtime CPU requests translate to cpu-shares and act as weights. The CPU shares assigned to each container determine the proportion of CPU time that each container can consume. Containers with higher CPU requests receive a higher proportion of CPU time, while containers with lower CPU requests receive a lower proportion of CPU time.

Bad Practices in Kubernetes Requests:

  1. Not setting a request at all: Failing to set a CPU/Mem request means that the container won’t have a guaranteed minimum amount of resource and may suffer from resource starvation.
  2. Over-provisioning: Setting a request that’s significantly higher than the required amount may lead to fluctuating costs, as well as less efficient utilization of available resources.
  3. Under-provisioning: Setting a request that’s too low may result in poor latency, Out of memory crashes and noisy neighbors when a few containers are sharing the same node.
  4. Not considering other pods on the same node: When setting requests it is important to take into account the other pods running on the same node to avoid resource starvation.
  5. Setting requests as a “one time show”: Cloud-native workloads are naturally dynamic, so when requests are not updated and have the initial value may increase costs or negatively affect workload performance.
    It’s very important to continuously analyze container resource consumption and change the requests accordingly. 

Kubernetes Limits

‘Limits’ represent the maximum amount of a resource (CPU/Mem) that is reserved for a container. 

Limits are important for ensuring that containers do not consume more resources than required, leading to potential performance degradation or resource starvation for other containers running on the same node.
However, it is important to note that setting ‘Limits’ too low can also result in resource starvation for the container, and may lead to poor performance or even pod termination. 

Memory Limits and CPU Limits in Kubernetes function differently.
When a container reaches its Memory Limit, it is terminated, while when a container reaches its CPU Limit, it is throttled, meaning that its processing speed is reduced in order to handle the peak.

Why Setting CPU Limits May Not Always Be Ideal:

Setting CPU limits may cause CPU throttling and limits the pod’s ability to take advantage of excess CPU resources if they are available. When a few pods run on the same node, they are all guaranteed to get their CPU requests. Using limits may cause the pods to not be able to utilize the full potential of the available resources when required, which may lead to poor performance even when there are available resources on the node.

Bad Practices in Kubernetes Limits:

  1. Setting Limits too low causes resource starvation and even termination of the container.
  2. Setting Limits too high results in unnecessary resource consumption and potential performance degradation for other containers running on the same node.
  3. Not taking into account the resources available on the node where the pod is scheduled and not utilizing idle and available resources.
  4. Not taking into consideration the resource usage patterns of the application during configuration. 
  5. Not updating limits when changing the workload: As workloads change, it is important to update the limits accordingly.


In today’s dynamic and ever-changing environment, finding the optimal resource requests and limits for each workload in a Kubernetes cluster can be a daunting task.
With constantly changing loads on the cluster and multiple workloads with different owners, it requires a continuous and repeatable process of monitoring and manual configuration of the resource requests and limits. This is a complex process, as every workload has different needs and requires a unique configuration. If not done properly, it can lead to poor performance, unutilized infrastructure, and significantly increased compute costs.

Dynamic Requests and Limits Automation with ScaleOps

At ScaleOps, we understand the challenges of managing resources in a dynamic and ever-changing environment. The ScaleOps platform provides a solution to this problem by continuously and automatically right-sizing pods resources during run-time, with no disruptions or downtime. Our platform eliminates the need for the engineering team to repeatedly and manually configure the resource of workloads, freeing up valuable time and resources.
The bottom line, with ScaleOps, you can reduce your compute costs by up to 80%, effortlessly, while ensuring workload SLA at all times. 

With our free trial, in 2 minutes using a simple Helm installation, you will receive full visibility into your potential savings and current workload utilization.

Yodar Shafrir

CEO & Co-founder

With over 7 years of product and development experience with cloud-native technology, Yodar founded ScaleOps to relieve engineering teams from the challenges of managing and optimizing Cloud-Native workloads.

Related Articles