Last updated: May 2026
Key Takeaways
- Pods stuck in
ImagePullBackOfforCrashLoopBackOffreserve their full CPU and memory requests while doing zero work. The scheduler still counts them against node capacity. - Over-provisioned sidecars and init containers inflate the pod’s effective request to the maximum of any container, forcing the Cluster Autoscaler to add nodes you do not need.
NotReadynodes accrue cloud charges with no usable capacity. They are the most expensive blind spot in most clusters.- GPU workloads waste capacity differently than CPU workloads. A pod that reserves a full A100 but only runs inference on 12 percent of the SMs is the AI-era equivalent of an over-requested sidecar.
- The fix is rarely a single config change. It is continuous, automated rightsizing tied to real workload behavior, not static defaults from a Helm chart.
What is Kubernetes resource waste?
Kubernetes resource waste is any CPU, memory, GPU, or node capacity that is reserved by the scheduler but not converted into useful work. The scheduler places pods based on resources.requests, not actual usage, so an over-requested pod blocks capacity from healthier neighbors. Multiply that across thousands of workloads and your cluster looks busy in the dashboard while half the bill goes to nothing.
There are two broad categories:
- Reserved-but-idle waste. Failing pods, NotReady nodes, oversized requests, and warm replicas that never see traffic.
- Allocated-but-underutilized waste. Sidecars sized for peak traffic, init containers that anchor pod scheduling at vCPU and GiB scales the steady-state app never needs, and GPUs claimed whole when the workload uses a fraction.
Both kinds reduce allocatable capacity, trigger unnecessary autoscaling, and inflate cloud spend. Both are fixable.
Pods stuck in ImagePullBackOff
A pod in ImagePullBackOff reserves its full CPU and memory request while running nothing. The scheduler placed it. The kubelet cannot pull the image. The capacity stays held until someone notices.
Common root causes:
- Wrong tag or registry path
imagePullSecretsmissing or out of date- Network egress rules blocking the registry endpoint
- Docker Hub or ECR rate limits hit during a deploy storm
Fix it: Audit pod events with kubectl get events --sort-by=.metadata.creationTimestamp on a schedule, not just when something breaks. Set imagePullPolicy: IfNotPresent for tagged images. Mirror critical public images to a private registry to dodge rate limits. For long-stuck pods, do not tolerate them in production: alert on any pod in ImagePullBackOff for more than five minutes.
Pods stuck in CrashLoopBackOff
CrashLoopBackOff is worse than ImagePullBackOff because the kubelet keeps trying. Each restart consumes scheduling cycles and locks the requested resources between attempts.
Typical failure modes:
- Missing environment variables or
ConfigMapkeys - Container exits non-zero on startup
- Liveness probe fires before the app has booted
- A dependency, like a database or queue, is not yet available
Fix it: Tune initialDelaySeconds on liveness probes so the app actually has time to start. Use startupProbe for slow-booting services so liveness only kicks in after the app is up. Ship structured logs to a central system so you do not have to kubectl logs --previous to diagnose. Set a hard restartPolicy budget at the cluster level: any pod with more than 10 restarts in 30 minutes should page the on-call.
Over-provisioned sidecars
Service meshes, log shippers, security agents, and OpenTelemetry collectors all run as sidecars. Most of them ship with default requests that assume a busy production workload. When you inject them into a low-traffic service, they over-claim CPU and memory the workload will never touch.
The math gets ugly fast. If Envoy requests 250m CPU and you have 4,000 pods in the mesh, that is 1,000 vCPU reserved across the cluster. If real Envoy usage is 40m, you are paying for 840 vCPU of nothing.
Fix it:
- Profile actual sidecar usage with
kubectl top podover a representative window, ideally a full week. - Right-size sidecar
requeststo the p95 of real usage, not the mesh’s default Helm values. - For non-critical sidecars, run them burstable: low requests, higher limits.
- Automate this. Static rightsizing decays. The Vertical Pod Autoscaler can recommend, but it cannot apply changes safely without restart logic. Continuous rightsizing platforms manage requests in real time without disrupting traffic.
Over-provisioned init containers
Init containers are the most counterintuitive source of waste. The scheduler does not place a pod based on the steady-state container’s request. It places the pod based on the maximum request across every container, including init containers.
Concretely: if your init container requests 2 vCPU and 4 GiB to do a one-time database migration, but your app container only needs 250m and 512 MiB, the scheduler still demands a node with 2 vCPU and 4 GiB free. The init container runs once. The reservation lasts the pod’s lifetime in the scheduler’s accounting model.
The downstream effects:
- Smaller nodes have unusable holes that no other pod fits into.
- The Cluster Autoscaler adds capacity to satisfy the inflated request.
- Karpenter picks larger instance types than the workload needs.
Fix it: Audit every init container’s requests. If the init job runs in 10 seconds and only needs 100m and 128 MiB, request that. Move heavy one-time setup work out of init containers entirely, into a Job or a build-time step.
NotReady and uninitialized nodes
A NotReady node is a virtual machine that is running, billable, and useless. The control plane will not schedule pods onto it. The cloud provider will not stop charging.
Common causes:
- Kubelet failed to register, often a network or certificate issue
- CNI plugin crashed during startup
- Disk pressure or PID pressure on the node
- Karpenter or Cluster Autoscaler provisioned a node that never finished bootstrapping
Fix it: Treat NotReady for more than 10 minutes as a hard incident, not a transient condition. Auto-terminate stuck nodes and let the autoscaler replace them. Track the metric: nodes in NotReady * node hourly cost = your daily waste rate. For most clusters running on EKS, AKS, or GKE, this single fix recovers two to five percent of monthly compute spend.
Idle and over-allocated GPUs
This category did not exist in the 2024 version of this article. It is now the largest source of waste in any cluster running AI inference or training.
GPUs are scheduled whole by default. A pod that requests nvidia.com/gpu: 1 gets the entire device, even if it only uses 8 GiB of a 40 GiB A100. Inference workloads with bursty traffic can sit at single-digit utilization for hours and still hold the full GPU.
The waste compounds in three ways:
- Reserved-but-idle. A pod warm-loaded with a model but not serving requests still holds the GPU.
- Fractional underuse. A small model on a large GPU never touches most of the SMs or memory.
- Replica over-provisioning. Teams keep extra replicas warm to avoid cold-start latency, doubling or tripling GPU spend.
Fix it:
- Use GPU sharing strategies: NVIDIA MPS for compute-bound inference, MIG for hard isolation, time-slicing for development workloads.
- Right-size GPU memory and compute requests against real usage, not allocation defaults.
- For inference with variable traffic, scale replicas based on request volume, not CPU. KEDA handles this well with HTTP or queue triggers.
- Treat GPU rightsizing as a continuous operation, not a one-time tuning exercise. Model behavior changes with traffic patterns.
- Reducing Kubernetes resource waste: next steps
- The eight categories above account for the majority of hidden waste in production clusters. The pattern is consistent: defaults are wrong, static fixes decay, and the tooling to find waste is not the same as the tooling to fix it.
- ScaleOps automates continuous pod and GPU rightsizing across CPU, memory, and replicas in real time, based on actual workload behavior. It manages requests without restarts, integrates with HPA, KEDA, and Karpenter, and surfaces waste across every cluster in a single view.
- See how ScaleOps works on a live cluster or book a demo with our team.
Frequently Asked Questions
What happens to cluster resources when pods are stuck in ImagePullBackOff?
The pods reserve their full CPU and memory requests in the scheduler’s accounting, even though they run no workloads. That capacity is unavailable to healthy pods until the image pull issue is fixed or the pod is deleted.
How do oversized sidecar containers waste Kubernetes resources?
Sidecars with inflated requests claim more CPU and memory than they actually use. This reduces the number of pods the scheduler can place per node and triggers unnecessary scale-up events from the Cluster Autoscaler or Karpenter.
Do NotReady nodes still cost money?
Yes. Cloud providers charge for the underlying VM regardless of whether the kubelet has registered with the control plane. A NotReady node provides zero schedulable capacity and full hourly cost.
Why are init containers a hidden source of resource waste?
The Kubernetes scheduler places pods based on the maximum request across every container in the spec, including init containers. An init container that requests far more than the steady-state app forces the scheduler to find a node sized for the init job, not the workload.
What metrics reveal hidden Kubernetes resource waste?
The gap between requests and actual usage from metrics-server, the count of pods in non-running states over time, NotReady node-hours, and GPU utilization percentile distributions. None of these show up in a default Grafana dashboard.