All articles

kubectl rollout: 7 Best Practices for Production (2025)

Nic Vermandé
Nic Vermandé

The kubectl rollout command suite is a fundamental part of managing applications in Kubernetes. But using it effectively in production requires more than just knowing the commands, it requires a disciplined strategy. What saves you during an emergency, can just as easily take down your clusters if used carelessly.

In this guide, we’ll walk through seven best practices that transform kubectl rollout from a set of manual commands to part of a reliable, production-ready deployment strategy.

1. Validate with Observability, Not status

The most dangerous command in this suite is kubectl rollout status. When it returns deployment "my-app" successfully rolled out,it only means one thing: Kubernetes managed to start the expected number of pods and they passed their readiness probes.

That doesn’t guarantee your application is actually healthy. A “successful” rollout can easily coincide with a spike in 500 errors, high latency, or critical connection failures.

In order words, Kubernetes is telling you the pods are alive, not that your users are happy.

Best Practice

Treat kubectl rollout status as the first signal, not the last. Your true source of validation is your observability platform. The moment a rollout begins, your eyes should be on your dashboards watching the four golden signals: latency, traffic, errors, and saturation. A rollout is only successful when these key business metrics remain stable or improve.

2. Use Annotation for Meaningful history

By default, kubectl rollout history isn’t helpful. It shows nothing more than revision numbers:

REVISION  CHANGE-CAUSE
1         <none>
2         <none>

In the middle of a production incident, this output is useless.

In a production incident, this tells you nothing. The deprecated --record flag is gone, so the professional standard today is to use annotations that explain why a change was made. Without that context, your rollout history is just noise.

Best practice

Immediately after any manual change, use kubectl annotate to record the reason. This creates a crucial audit trail for incident response and debugging.

# Step 1: Update the deployment
kubectl set image deployment/api-server api-server=my-repo/api:v1.9.2

# Step 2: Annotate the change with a clear, concise reason
kubectl annotate deployment/api-server kubernetes.io/change-cause="Rolled back to v1.9.2 due to high latency in v1.9.3"

3. Use pause for Manual Canary Analysis

Relying on a full rolling update for a high-risk change is a gamble. A much safer pattern is using kubectl rollout pause as a deliberate validation gate, so that you can perform a manual canary analysis.

Best practice

For sensitive changes, immediately pause a rollout after it begins. This creates a small number of canary pods running the new version.

# Start the rollout and immediately pause it
kubectl set image deployment/webapp webapp=my-image:v2.1
kubectl rollout pause deployment/webapp

Now you can analyze the canaries’ performance in your monitoring tools with minimal user impact. If they are stable, kubectl rollout resume deployment/webapp continues the deployment. If not, you can roll back before any significant damage is done.

4. Reserve undo for Emergencies, Not Strategy

The kubectl rollout undo command is an incredibly effective emergency brake. However, it is not part of a healthy deployment strategy.

It is not, however, part of a healthy deployment strategy. If you are frequently using undo after a full rollout, it’s a sign that your pre-deployment testing and validation processes are failing.

Best practice

Treat rollbacks as a last resort. The goal is not to get faster at rolling back. The goal is to build a process where you don’t have to. Using the manual canary analysis pattern from the previous step is a key part of this, as it allows you to catch issues and roll back before a full deployment.

5. Graduate to GitOps for Declarative Deployments

Manually running kubectl set image or kubectl apply from a terminal is not a scalable, auditable, or a safe way to manage production deployments. It introduces the risk of human error and makes it difficult to track who changed what and when they changed it.

Best practice

Adopt a GitOps workflow using tools like Argo CD or Flux. In this model, your Git repository is the single source of truth for your cluster’s desired state. All changes are made via pull requests. This is the baseline for modern, production-grade Kubernetes operations.

6. Adopt Automated Progressive Delivery

Manual canary analysis is a good step up from a standard rolling update, but it’s still slow and depends on a human staring at a dashboard and making the call. The industry standard for safe, high-velocity deployments is automated progressive delivery.

Best practice

Use a dedicated controller like Argo Rollouts or Flagger. These tools automate the entire canary process. You define KPIs like error rate and latency in a manifest, and the controller automatically analyzes the canary’s performance against them. If the metrics degrade, it triggers an automatic rollback without any human intervention. This enables teams to deploy faster and with significantly less risk.

7. Implement Proactive Replica Optimization

Even the most advanced automated canary rollout shares a dangerous assumption: that the pods being deployed are correctly sized. A sophisticated strategy for deploying a CPU-throttled or memory-starved pod is still a strategy for deploying a broken application. This creates the core challenge of Replica Optimization: running the right number of rightsized pods.

This exposes a fundamental conflict in Kubernetes autoscaling:

  • VPA (Vertical Pod Autoscaler) controls Pod resource requests/limits.
  • HPA (Horizontal Pod Autoscaler) controls the number of Pods.

While technically possible to run both in auto mode, their behaviors create a destructive conflict. VPA’s need to restart pods to apply new resource requests directly fights HPA’s goal of maintaining a stable replica count to meet demand.

This fundamental conflict represents one of Kubernetes’ most persistent operational challenges. Platforms like ScaleOps address this by layering intelligence on top of Kubernetes’ native tools to achieve two critical outcomes:

  1. Enabling Rightsizing with HPA: Instead of competing with HPA, ScaleOps works alongside it. It continuously analyzes real-time pod metrics, including CPU throttling and memory usage patterns, to determine optimal resource requests and limits. This solves the core “VPA vs. HPA” problem, allowing you to benefit from perfect vertical sizing while HPA dynamically manages horizontal scaling.
  2. Making HPA Proactive, Not Reactive: Standard HPA is always late; it scales up only after your pods are already under heavy load. ScaleOps makes HPA predictive. By analyzing your application’s historical performance data, it identifies its unique traffic pattern—the 9 AM login rush, the end-of-day batch jobs. Based on these learned patterns, ScaleOps proactively manages HPA’s configuration before the traffic hits, ensuring capacity is ready the moments it’s needed.

Wrapping up

So, where does kubectl rollout fit in a modern production environment in 2025? It’s an indispensable professional tool, not a historical artifact.

While automated progressive delivery is the goal for routine, high-velocity deployments, kubectl rollout commands remain essential for emergency interventions, critical debugging, and specialized scenarios where direct, surgical control is paramount.

Mastering these manual commands is the foundation of operational excellence. The journey continues by building upon that foundation with declarative GitOps for auditability, automated analysis for safety, and finally, intelligent replica optimization to ensure every deployment is not just safe, but also performant and cost-efficient. The mature team doesn’t discard their manual tools; they know precisely when to use them.

Interesting to learn more about ScaleOps?

Book a demo with an expert or explore the full platform.

Related Articles

Start Optimizing K8s Resources in Minutes!

Schedule your demo

Submit the form and schedule your 1:1 demo with a ScaleOps platform expert.

Schedule your demo