Everyone loves Karpenter. It’s fast, flexible, and efficient at provisioning the right nodes on AWS. For teams running dynamic workloads, it’s a major upgrade from legacy autoscalers. But speed isn’t everything—especially in production environments.
Real-world workloads don’t just need faster node provisioning. They need smarter scaling decisions that account for how pods are behaving, what constraints are in place, and how different applications perform under pressure.
And we can’t forget: Karpenter is AWS-native. That’s fine for single-cloud setups. But as more teams move to multi-cloud and hybrid environments, Karpenter alone can’t keep up.
This guide unpacks what it really takes to run Karpenter in production. We’ll share best practices, highlight common pitfalls, and explore what else is needed for an autoscaling strategy that delivers, no matter your environment.
What Karpenter Does Well
Karpenter was built to solve a specific problem in AWS: fast, efficient infrastructure provisioning. When your cluster needs more compute, Karpenter adds the right nodes quickly. It picks instance types based on what’s available and what your pods need, without you having to define every rule ahead of time.
But that’s where its scope ends. It doesn’t tune pod requests or understand how workloads behave. It assumes the resource requests it receives are accurate and acts accordingly.
If your pods ask for more than they really need, Karpenter will scale up anyway. That means adding unnecessary nodes, increasing costs, and creating resource waste. Not because Karpenter failed, but because it’s working with flawed inputs.
So while Karpenter solves part of the problem, it doesn’t handle the full picture. To scale smartly in production, you need more than reactive node provisioning. You need scaling that understands and is aware of your application’s context.
Why Static Recommendations Fall Short
Production environments don’t stand still. Traffic goes up and down. Teams deploy new code. Features get turned on and off. What worked an hour ago probably doesn’t work now.
That’s why scaling based on static rules or delayed recommendations doesn’t cut it. You can’t wait for alerts, dig through dashboards, check with the app team, and then make a change. By the time that happens, the situation has already changed again, probably a few times.
Manual scaling is inherently flawed. It’s always playing catch-up. It’s reactive. And it slows everyone down.
Karpenter helps by reacting quickly when pods can’t be scheduled. But it still relies on the data it gets from your workloads. If that data isn’t aligned with the context of your workloads, Karpenter can’t make the right decision, no matter how fast it is.
Real-Time, Pod-Level Automation: The Missing Piece
Karpenter does a great job managing nodes. It picks the right instance types, spins them up quickly, and gets pods scheduled fast. But that only works if the pods are asking for the right amount of resources. But most of the time, they’re not.
In many clusters, pods over-request CPU and memory . Sometimes developers guess, copy old configs, or simply forget to update specs as services evolve.
When pod requests are too high, Karpenter sees those inflated numbers and scales unnecessarily, driving up costs and wasting compute.
You can’t fix this by just watching dashboards or making occasional tweaks. Workloads are constantly evolving and static requests will always lag behind.
That’s why pod-level automation matters. You need something that watches how each workload behaves over time and adjusts its resource requests automatically. If a pod is only using 300m of CPU but asking for 1000m, the system should update that. If usage spikes at certain hours, the system should catch it and respond.
When resource requests are more accurate, Karpenter doesn’t need to overprovision. It can pack workloads more tightly on fewer nodes. That saves money and keeps clusters running lean.
Karpenter can’t make good decisions with bad data. Pod-level automation gives it the right inputs, so it can do its job properly.
Respect workload constraints
In production, autoscaling can’t break things. Some apps can move around freely. Others can’t.
If a service needs to stay on the same node, or can’t be disrupted during a rollout, you need to respect that. Kubernetes gives you tools for this—like PodDisruptionBudgets, affinity rules, and safe-to-evict settings.
Karpenter respects these when it drains nodes. But if you’re adding pod-level automation, it also needs to follow the same rules. Otherwise, you risk downtime or unpredictable behavior.
Different apps have different needs. Some scale up fast. Others should stay stable. A good system adjusts based on how each workload behaves.
One-size-fits-all doesn’t work. Your autoscaler needs to know the difference, and act accordingly in real-time.
Keep Developers in the Loop
Developers know their apps best. But they don’t want to manage infrastructure. And they shouldn’t have to.
If you expect them to review scaling recommendations or tweak pod specs manually, progress slows and frustration builds. But that doesn’t mean they should be in the dark.
The right balance: automate the tuning, but provide visibility. Show developers what’s happening. Give them dashboards and impact reports. Keep them informed without making them the bottleneck.
This keeps DevOps in control, developers aligned, and delivery smooth.
Plan for Multi-Cloud and Multi-Autoscaler Realities
Most teams don’t run everything in one cloud anymore. You might have Karpenter on AWS, Cluster Autoscaler on GCP, and something custom on-prem. That’s normal. But it also makes things more complicated.
Karpenter is AWS-native. It won’t work in other environments. That means if you’re using more than one cloud—or plan to—you need a scaling strategy that works across them.
Each environment has its own rules, limitations, and tools. But the goal stays the same: scale efficiently, safely, and automatically. Your scaling logic should adapt to each setup without being rewritten from scratch.
That’s where a platform-aware, cloud-agnostic approach helps. Instead of tying your logic to a specific tool or cloud, build an automation layer that can:
- Plug into any environment (cloud, hybrid, on-prem, air-gap)
- Respect existing policies (PDBs, affinity, resource limits)
- Optimize workloads consistently, no matter where they run
Also consider where your scaling decisions run. Some setups use SaaS control planes, but for teams in regulated industries or air-gapped environments, that’s not an option. In those cases, self-hosted, in-cluster logic is the only path.
The more environments you operate in, the more important it becomes to make your autoscaling portable, policy-aware, and flexible.
Understanding Karpenter Limitations
Let’s summarize: Karpenter is a powerful tool, but it has limits. Knowing where those limits are can help you avoid costly mistakes.
Here are the most common ones:
- It only runs on AWS. You can’t use it in GCP, Azure, or on-prem environments.
- It doesn’t tune pod resources. It assumes your CPU and memory requests are correct, even if they’re way off.
- It doesn’t know how your apps behave. It can’t tell if a service is bursty, latency-sensitive, or under pressure.
- It can’t adjust in real time at the workload level. Once the pod is scheduled, it’s out of scope.
- It has no view into long-term patterns. It’s great for fast reaction, but not for planning ahead.
None of these are dealbreakers. They just mean Karpenter needs help. Use Karpenter for what it’s good at. But don’t rely on it alone.
Karpenter is just the beginning
Karpenter is great at what it was built for: fast, efficient node provisioning on AWS. But running Kubernetes in production—especially across teams, workloads, and clouds—takes more than that.
Real-world autoscaling means handling pod-level efficiency, respecting constraints, adapting in real time, and working across environments. That’s outside Karpenter’s scope.
If you want to scale safely and cost-effectively, Karpenter is a strong start. But to get the full picture, you need automation that understands your workloads, not just your nodes.
And this is where ScaleOps comes in: we automate real-time resource optimization at the most granular level (pod) across any environment, so you can scale with precision, not just speed.