Booksy is a leading online booking and business management platform designed for service-based industries, including beauty, wellness, fitness, and healthcare. The platform offers a seamless scheduling experience for both businesses and their clients, featuring appointment booking, reminders, payment processing, and marketing tools. With a focus on streamlining operations and enhancing customer engagement, Booksy empowers businesses to grow and manage their services with ease. Trusted by millions of users worldwide, Booksy is committed to delivering innovative solutions that drive efficiency and improve the client experience.
Key Results:
Significant Reduction in CPU as a Key Driver to Cost Savings
By leveraging ScaleOps technologies to optimize its Kubernetes-based infrastructure, Booksy has reduced its CPU requests by ~80% in the past year. ScaleOps’ ongoing automation eliminates over-provisioning while ensuring system performance and availability during peak loads. These changes not only decreased cloud costs significantly but also enhanced the reliability and scalability of their operations. This holistic approach provided immediate cost benefits and improved workload performance while supporting Booksy’s global user base growth.
Enhanced Efficiency Through Automation Frees Engineering Resources
The automation implemented by ScaleOps removed the need for manual workload tuning, significantly reducing the time and effort required by engineering teams. This shift allowed Booksy to reallocate valuable resources to other high-priority projects, boosting overall productivity and enabling the team to focus on innovation and growth rather than constant operational adjustments. The result was not only more efficient use of engineering talent but also a smoother, more scalable Kubernetes environment.
Optimizing Resource Allocation with Auto-Detected Policies
Booksy leveraged the built-in auto-detected policies to quickly and easily implement ScaleOps. The out-of-the-box feature automatically identified and applied the most suitable configurations for each workload, honoring Booksy’s diverse use cases and optimizing them individually. This ensured efficient CPU and memory allocation, preventing issues like throttling and OOM errors. By dynamically adapting to each workload’s unique requirements, ScaleOps enabled Booksy to maintain consistent performance and reliability while reducing operational overhead and achieving high service availability.
The Challenge
Automating the Management of Over 3,000 Workloads
Manual intervention was no longer sustainable, especially for production workloads that demanded high availability and reliability. Booksy faced a critical challenge in managing over 3,000 workloads distributed across development, staging, and production environments. The team used to check savings manually every few weeks or even months, resulting in missed opportunities for significant cost reduction. The complexity of scaling resources and maintaining consistent performance across such a vast ecosystem required a solution that could automate resource allocation and streamline infrastructure management.
Ensuring Consistent Performance in Dynamic Environments
Booksy encountered recurring challenges with CPU throttling and Out-Of-Memory (OOM) issues across their environments, which risked impacting application performance and availability. While the team worked diligently to avoid any disruptions for their customers, adhering to strict internal SLAs meant quickly identifying and mitigating performance issues. However, relying on manual intervention to monitor and resolve these issues was time-consuming and inefficient, especially as workloads scaled. Booksy needed a solution that could proactively optimize resource requests, eliminate performance degradation, and ensure reliable application performance at scale.
Manually Flagging Critical Events at Scale
In addition to resource automation needs, Booksy struggled to track crucial system events such as CPU and memory utilization percentages, workload disruptions, and pod health issues. Without detailed insights into these metrics, the team found it challenging to manually detect performance issues, inefficiencies, and potential risks before they impact service availability. This led to increased costs and inconsistent application performance.
The Solution
Eliminating Manual Work with Automation
To address the complexity of managing thousands of workloads across multiple environments, Booksy implemented ScaleOps to fully automate and manage resource allocation at scale. By introducing automation to their environments, Booksy eliminated the need for manual intervention in managing workloads, which had previously required regular checks for resource optimization. The ScaleOps platform dynamically adjusts resources based on real-time demand, ensuring efficient performance with significantly less human oversight. The shift to automated scaling not only minimized manual effort but also allowed the team to focus on more strategic tasks. With resource management streamlined and automated, Booksy could trust the system to optimize for both cost and performance.
Dynamic Optimization with Improved Performance and Reliability
To address the risk of CPU throttling and OOM issues, Booksy implemented ScaleOps that automated resource requests and provided real-time performance and troubleshooting capabilities at the pod, cluster and multi-cluster level. This proactive approach ensured that workloads dynamically scaled requests on demand, eliminating manual intervention and preventing resource contention. With automated scaling and optimization, Booksy was able to meet internal SLAs, minimizing downtime and preventing customer-facing service degradation. The new system provided the team with better visibility into resource usage and system performance, keeping developers in the loop while automating resource requests on their behalf.
Troubleshooting Infrastructure Anomalies
To overcome the challenge of manually detecting critical events, ScaleOps automatically flags occurrences such as CPU and Memory Utilization, workload disruptions, and noisy neighbors. This allows Booksy to go under the hood and troubleshoot resources that are impacting availability at the workload, cluster, or multi-cluster level. Additionally, ScaleOps provides valuable metrics, such as identifying expensive or wasteful workloads, empowering the team to make data-driven decisions that improve cost efficiency and ensure consistent performance across their infrastructure.
The Impact
Cutting the Number of Allocatable CPU in Half
ScaleOps’ best-in-market product reduced Booksy’s allocated CPU by ~50% across dev, staging, and production clusters. The platform automatically rightsizes workloads in real-time, ensuring efficient allocation of CPUs and memory. This not only reduces costs but also enhances performance and reliability, allowing Booksy to focus on scaling without worrying about excessive infrastructure expenses.
Reduced Infrastructure Management Time
Booksy has eliminated the manual effort required to rightsizing its infrastructure through the automated optimization capabilities provided by ScaleOps. The platform continuously adjusts resource requests based on real-time demand, allowing Booksy’s teams to focus on higher-priority tasks and strategic initiatives rather than spending time on constant workload tuning. This has streamlined operations and ensured better resource efficiency, contributing to improved performance across environments.
Keeping Developers in the Loop
By adopting ScaleOps, Booksy gained a comprehensive solution that not only automates resources at the pod level, but also keeps developers in the loop providing them visibility into critical events that can affect performance, reliability, and cost.
Summary
Booksy partnered with ScaleOps to optimize its Kubernetes-based infrastructure, reducing costs by 50%. Using ScaleOps’ automated pod rightsizing capabilities, the team significantly cut cloud costs while improving performance and scalability, even during peak demand. The onboarding of ScaleOps was seamless, with a simple Helm chart and easy installation processes that streamlined deployment and upgrades. ScaleOps’ out-of-the-box and auto-detected policies gave Booksy a fast, hands-free experience while ensuring optimal resource allocation and high service availability. This holistic approach has not only delivered significant savings but increased operational efficiency as well.
About AccessFintech
AccessFintech, a pioneering force in financial technology, enhances operational efficiency and transparency for financial institutions through its collaborative data management platform. With a mission to create a more interconnected and informed financial ecosystem, AccessFintech’s innovative solutions streamline workflows, mitigate risk, and promote industry-wide collaboration. Trusted by leading financial institutions and utilized across global markets, AccessFintech stands out for its dedication to quality and technological advancement. Managing millions of data points daily, the company partners with top-tier organizations and is recognized as a trailblazer in transforming financial operations and data management.
Key Results
Significant Cost Savings: By using ScaleOps to automate their Production environments, AccessFintech achieved a significant cost reduction by reducing resource allocation requests by more than 60%.
Optimized Resource Allocation: AccessFintech benefited from frictionless and hands-free resource allocation processes, ensuring optimal cost savings without sacrificing performance.
Enhanced Efficiency and Reliability: ScaleOps’ ongoing and automatic Pod rightsizing ensured that AccessFintech maintained high reliability throughout a variety of events and peaks across all production clusters.
Increased Developer Productivity: The automation of resource management enabled AccessFintech’s engineers to focus on innovative projects, significantly boosting productivity.
The Challenge
Manual Management of Pod Requests: The manual sizing of pod resource requests posed significant challenges, leading to inefficiencies and inconsistencies in resource allocation.
High Reliability for Production Environments: Ensuring the reliability of production environments was critical. Any resource mismanagement risked service disruptions, potentially damaging the company’s reputation and customer trust.
Ongoing Optimization Efforts: AccessFintech’s production environments are dynamic and constantly changing. As such, the engineering teams’ attention was required as part of the manual effort to size resource requests.
Managing Resources Across Multiple Clusters: Overseeing resources across multiple clusters added complexity, necessitating significant manual intervention and monitoring.
Managing Kubernetes resources at AccessFintech involves several challenges, particularly when it comes to precise resource allocation. Overprovisioning leads to wasted resources, while underprovisioning can cause performance issues or outages. The complexity is further heightened by the need to manage resources across multiple clusters, each requiring detailed oversight and manual intervention.
In production environments, maintaining high reliability is essential. Any missteps in resource management can lead to significant service disruptions, undermining customer trust and the company’s reputation. The manual management of pod resource requests introduced inefficiencies and inconsistencies, making it difficult to achieve optimal resource utilization.
Effectively addressing these challenges demands a proactive approach: continuous optimization, precise balancing of resource allocation, and strategic oversight of
The Solution
Full Automation – ScaleOps automates the optimization of AccessFintech’s containerized workloads, dynamically adjusting CPU and memory requests in real-time according to demand. This ensures optimal performance even during peak usage periods.
Optimization of Critical Workloads: ScaleOps effectively optimized AccessFintech’s critical production workloads which require high-availability, such as Kafka, using advanced automation. This ensures precise resource allocation, maintaining reliability and performance without causing any disruption. AccessFintech trusts ScaleOps for a robust operational environment.
Seamless Integration with Auto-Detecting Policies: ScaleOps’ auto-detecting policies provided AccessFintech with tailored, out-of-the-box solutions, simplifying resource management without extensive manual configuration. Trusted automation based on real results made the tool accessible and efficient for AccessFintech’s engineering teams.
Hands-Free Operations – The automated processes implemented by ScaleOps free AccessFintech’s engineering teams from the ongoing task of resource management, allowing them to focus on more strategic initiatives. This hands-free approach is particularly critical in AccessFintech’s rapidly growing and dynamic environment.
AccessFintech’s implementation of ScaleOps addresses the core challenges of manual management, ongoing optimization, and balancing resource allocation. By automating these processes, AccessFintech achieves a robust, hands-free management system that ensures high reliability and efficiency in their production environments, ultimately leading to substantial cost savings and improved operational performance.
The Impact
Implementing ScaleOps at AccessFintech has had a profound impact on resource management, cost efficiency, and operational reliability. Here’s how:
Cost Savings
Optimized Resource Utilization: The automation and optimization capabilities of ScaleOps have led to significant cost savings for AccessFintech. By accurately aligning resource allocation with workload needs and reducing the need for manual oversight, AccessFintech has achieved a more cost-effective operation.
Enhanced Performance Across Environments
Consistent High Performance: ScaleOps ensures enhanced performance across all environments, including production. With real-time optimizations and precise resource management, AccessFintech can maintain high reliability and efficiency, meeting the stringent demands of financial technology services.
Request Optimization
Significant Reduction in CPU Allocation Requests: ScaleOps has optimized resource requests, leading to over a 60% reduction in CPU allocation requests. This substantial decrease not only ensures that resources are used efficiently but also minimizes the risk of over-provisioning, contributing to overall cost savings.
Automation of Manual Tasks
Enhanced Focus for Developers and DevOps Teams: By automating manual tasks related to resource management, ScaleOps has allowed AccessFintech’s developers and DevOps teams to concentrate on their core responsibilities. This shift has resulted in increased productivity and more time dedicated to strategic projects and innovation.
Ease of Use
Built-in Auto-Detecting Policies: AccessFintech benefits from ScaleOps’ built-in auto-detecting policies, which are tailored to fit the specific needs of each workload. This ease of use simplifies resource management, making it accessible and efficient for the engineering teams without requiring extensive manual configuration.
Overall, ScaleOps has transformed how AccessFintech manages its Kubernetes environment. The automation and optimization capabilities provided by ScaleOps have driven significant improvements in efficiency, cost savings, and service reliability, empowering AccessFintech to achieve its mission of enhancing operational efficiency and transparency for financial institutions.
About Orca Security
Orca Security is a leading SaaS company in the cyber security industry, renowned for its advanced security solutions that help organizations protect their cloud environments. With a large team of engineers, Orca Security leverages Amazon Elastic Kubernetes Service (EKS) to run its workloads. The company’s commitment to innovation and efficiency drives its continuous efforts to optimize resource management and ensure the smooth operation of its platform.
The Challenge
Orca Security encountered several significant challenges in managing the resources of their Kubernetes workloads:
Highly Dynamic Production Clusters: The company operates in a highly dynamic environment where production clusters experience frequent and massive scale-ups and scale-downs. This variability demands a resource management solution that can adapt quickly and efficiently to changing conditions.
Non-Interruptible Workloads: Many of Orca Security’s workloads are critical and cannot be interrupted. These workloads require careful handling to ensure they remain stable and uninterrupted during scaling operations.
Manual Pod Rightsizing: The process of adjusting pod sizes to match their actual resource needs was labor-intensive and manual. This task required significant time and effort from the platform team, diverting their attention from other strategic initiatives.
Managing Resources Across Multiple Clusters: With multiple clusters to oversee, resource management became increasingly complex, requiring considerable manual intervention and monitoring.
Engineer Communication and Visibility: Keeping the engineering teams informed about resource usage and workload behavior was essential for efficient operations. However, achieving this visibility and maintaining clear communication was challenging.
The Solution
Orca Security selected ScaleOps as their automatic resource management platform for Kubernetes workloads to overcome these challenges. The installation was quick, and time to value short, leveraging several key features and capabilities of the ScaleOps platform:
Out-of-the-Box Functionality: The solution came with robust out-of-the-box functionality, requiring minimal configuration and customization. This meant that Orca Security could quickly implement and benefit from ScaleOps without extensive setup time.
Zero-Touch Ongoing Pod Rightsizing: ScaleOps provided a fully automated solution for pod rightsizing. By continuously monitoring and adjusting pod sizes based on real-time usage data, ScaleOps ensured optimal resource allocation without requiring manual intervention from the platform team.
Un-Evictable Bin-Packing Capabilities: ScaleOps’ built-in un-evictable bin-packing capabilities allowed Orca Security to rightsize critical workloads that could not be interrupted efficiently. This feature ensured these workloads were placed and scaled appropriately without risking eviction.
Multi-Cluster View: ScaleOps provided a unified view of all clusters, simplifying the management of resources across multiple environments. This centralized approach enabled better monitoring and more efficient resource distribution.
Enhanced Visibility into Workload Behavior: ScaleOps offered comprehensive insights into workload behavior and resource usage. This visibility enabled the engineering teams to monitor performance, identify potential issues, and make informed decisions about resource allocation.
The Impact
The adoption of ScaleOps brought about substantial improvements in Orca Security’s resource management and overall operational efficiency:
50% Reduction in CPU Allocation Requests: By rightsizing pods automatically, ScaleOps reduced the CPU allocation requests by half. This optimization translated into thousands of CPU cores being saved, significantly reducing resource wastage.
Cost Savings Across 3,000+ Workloads: The efficient resource management led to notable cost reductions across more than 3,000 workloads. These savings allowed Orca Security to reallocate the budget towards other critical areas of the business.
Enhanced Performance of Thousands of Workloads: The automated and ongoing optimization of resources resulted in improved performance for thousands of workloads. This improvement ensured that resources were used efficiently and workloads ran smoothly, contributing to overall system stability and reliability.
Streamlined Platform Team Operations: By automating the tedious and manual process of pod rightsizing, ScaleOps freed up the platform team’s time, allowing them to focus on higher-value tasks and strategic initiatives.
Improved Engineer Awareness and Engagement: With better visibility into workload behavior, the engineering teams were more informed and engaged. This transparency facilitated better decision-making and proactive management of resources.
In summary, the implementation of ScaleOps transformed Orca Security’s approach to resource management. The company achieved significant cost savings, improved workload performance, and enhanced operational efficiency, positioning itself for continued growth and success in the cybersecurity industry.
About Dazz
Dazz delivers unified security remediation for fast-moving security and development teams. We plug into the tools that find code flaws and infrastructure vulnerabilities, cut through the noise, prioritize issues that matter most, and deliver fixes to owners, all in a developer-friendly workflow. As a result, our customers are able to massively streamline their remediation processes and reduce exposure in hours instead of weeks. Dazz is becoming the standard for leading Application Security Posture Management (ASPM), Continuous Threat and Exposure Management (CTEM), and DevSecOps practices.
Key Results
50% reduction in needed CPU and Memory
By implementing the ScaleOps platform, Dazz nearly doubled the number of running workloads and pods while maintaining the same operational costs. Utilizing automatic pod rightsizing across all their Kubernetes clusters, Dazz significantly optimized resource allocation. ScaleOps continuously tracked CPU and Memory usage of pods and automatically adjusted resource requests to meet real-time demand. This dynamic resource management eliminated waste, improved cluster performance, and enhanced availability. As a result, Dazz could scale its services effectively to meet growing business demands without incurring additional infrastructure expenses.
Seamless resource management across all environments
ScaleOps automated resource management across Dazz’s Dev, Staging, and Production environments. By deploying ScaleOps throughout their development pipeline, Dazz ensured consistent performance optimization at every stage. The platform’s ability to automatically adjust resource requests based on real-time usage meant that each environment operated efficiently without manual intervention, whether for development, testing, or live deployment. This automation enhanced cluster performance and availability, allowing Dazz to focus on innovation rather than routine maintenance.
Rapid & smooth onboarding onto all clusters
Using ScaleOps’ self-hosted architecture and frictionless installation process, Dazz successfully and effortlessly onboarded ScaleOps onto all of their Kubernetes clusters quickly and efficiently. The self-hosted model allowed Dazz to deploy the platform within their own secure infrastructure, aligning with their compliance and security requirements. The straightforward installation process required minimal configuration and no significant downtime, enabling Dazz to implement automatic pod rightsizing across all clusters rapidly. This quick onboarding facilitated immediate benefits from resource optimization without disrupting existing operations.
The Challenge
Manual Management of Pod Requests
The manual approach to pod resource requests at Dazz led to inefficiencies and inconsistencies in resource allocation. Estimating the precise CPU and memory requirements for each workload was time-consuming, often resulting in over-provisioned resources and wasted costs.
High Reliability for Production Environments
For Dazz, maintaining high reliability in production environments was crucial. Any fluctuation in resource availability risked interruptions in service, potentially affecting customer trust and tarnishing the brand’s reputation. ScaleOps is needed to deliver efficient resource allocation and ensure stability and reliability.
Managing Resources Across Multiple Environments
Dazz operates across Production, Staging, and Development environments, each with unique characteristics and demands. For example, while Production required consistent uptime and reliability, the Development and Staging environments required flexibility for testing and adjustments. Manually managing these differences was complex and time-consuming.
Hundreds of different Workloads
Dazz’s infrastructure supports a variety of workloads, each with its own CPU and memory requirements. Variability in developer-defined resource requests often led to inefficiencies, with resources either over- or under-provisioned. Managing these inconsistencies became increasingly challenging, especially as Dazz scaled its operations.
The Solution
Automation across environments
ScaleOps automated Dazz’s containerized workloads across various cloud-native technologies, including their hybrid setup. This automation ensured optimal resource requests, adjusting in real-time to workload demands and optimizing both cost and performance across their on-premises and cloud clusters.
Optimization of Critical Workloads
Using advanced automation, ScaleOps effectively optimized Dazz’s critical production workloads, which required high availability. The platform ensured precise resource allocation, maintaining reliability and performance without causing any disruption. This robust operational environment bolstered Dazz’s confidence in consistently meeting customer expectations.
Out-of-the-Box Scaling Policies
ScaleOps provided predefined and auto-assigned scaling policies, making it easy for Dazz to optimize different workloads with varying characteristics and scaling goals. This resulted in a zero-touch experience that maximized cost savings and performance. The flexibility of these policies allowed Dazz to tailor resource management to specific needs without extensive manual configuration.
The Impact
Cost Savings with Increased Workload Capacity
By automatically right-sizing resources, ScaleOps reduced Dazz’s required resources by 50%, effectively doubling Dazz’s production workload capacity without increasing costs. This optimized resource usage translated into substantial cost savings and a scalable foundation for future growth.
Automated Resource Management Across All Environments
Within days, Dazz was able to onboard ScaleOps across all clusters. This seamless, hands-free setup allowed Dazz’s teams to quickly transform their approach to resource management, automating each cluster’s rightsizing while maintaining the flexibility to monitor and adjust as needed.
Freedom for Engineering Teams
By eliminating the need for manual rightsizing, ScaleOps freed Dazz’s engineers to focus on delivering valuable features and improvements. With ScaleOps automating resource adjustments, Dazz’s engineering teams could allocate more time to development work, improving productivity and innovation.
Summary:
Dazz’s experience with ScaleOps showcases how automated resource management can drive both operational efficiency and cost savings across Kubernetes environments. ScaleOps’ powerful optimization features have transformed Dazz’s resource allocation, allowing the company to manage larger workloads without increased costs. By adopting ScaleOps, Dazz achieved consistent, optimized resource utilization, strengthened reliability, and empowered engineers to focus on high-value tasks. This case highlights ScaleOps’ potential to help organizations achieve a scalable, efficient Kubernetes infrastructure designed to meet dynamic business needs.
About OutBrain
Outbrain, a leading content discovery platform, empowers publishers and marketers to reach their audiences with personalized, engaging content. Founded with the mission to make the internet a better place, Outbrain’s innovative technology delivers relevant recommendations that drive audience engagement and revenue growth. Trusted by top-tier publishers and utilized by millions of users worldwide, Outbrain stands out for its commitment to quality and performance. Serving over 10 billion recommendations daily, the company partners with premier brands and is recognized as a pioneer in the content marketing industry.
Key Results
Cost Savings
ScaleOps played a crucial role in Outbrain’s broader cost-efficiency efforts and effectively managed to reduce costs across various workloads.
Efficient Resource Allocation
With ScaleOps, Outbrain is able to optimize the process of resource allocation to ensure cost savings while not compromising on performance.
Increased Efficiency and Reliability
Accurate pod rightsizing keeps Outbrain reliable and SLO-compliant even at peak demand.
Boost Developer Productivity
Hands-free rightsizing with ScaleOps lets the engineering teams at Outbrain focus on innovation instead of resource management.
The Challenge
Ongoing Optimization Effort
Running Outbrains production workloads effectively on AKS requires ongoing optimization to prevent charges from wasted resources and achieve immediate cost savings through efficient resource management.
Inefficient Resource Management
When there is an applicative problem, developers usually add resources, whether this will resolve the problem or not. Rarely, or never, do developers revisit healthy deployments to see if the requested resources can be reduced.
Seasonality
Outbrain’s workloads are seasonal by nature, impacting production traffic and resource utilization on an ongoing basis.
Diverse Workloads
Various types of workloads with different characteristics, including Java-based workloads which are always a challenge in a containerized environment. Managing CPU and memory requests at Outbrain presents several challenges, particularly when developers specify resource requests inaccurately. This inefficiency in resource allocation leads to wasted resources and resource contention. The issue is exacerbated by the seasonality of resource utilization and by Outbrain’s traffic patterns, where Outbrain’s services experience fluctuating demands.
During peak times, inadequate resource requests can cause performance issues or outages, while overestimating needs will drive up costs unnecessarily. The constant challenge lies in getting engineers to act and revise their resource requests to match actual usage patterns, ensuring efficiency and reliability. Additionally, Outbrain’s diverse services, each with distinct Service Level Objectives (SLOs), add another layer of complexity to resource management. Java-based workloads further complicate matters with their unpredictable memory usage and potential lack of container optimization. Effective resource management at Outbrain requires continuous education and collaboration with developers, proactive monitoring, and strategic adjustments to maintain consistent service quality across diverse workloads.
The Solution
Full Automation
ScaleOps is used to fully automate CPU and Memory requests on all of Outbrain’s AKS environments, including the automation of the Production environment.
ScaleOps Policy Efficiency
Automatic default Outbrain ScaleOps policy applied across all workloads.
The Java Use Case
ScaleOps supports Outbrain’s use case with JVM sensitive recommendations and automation (XMX).
Out-Of-The-Box Solution
All new workloads have a default ScaleOps policy automatically attached and are automated programmatically.
Business Efficiency
Engineers cannot change the policy or to switch off the automation. They must explain their requirements in terms of impact, revenue, and cost. More than 90% of Outbrain’s workloads are automated.
To address these challenges, Outbrain implemented ScaleOps on their AKS production clusters to provide a robust solution for ongoing optimization and automation of resource requests, offering engineering teams a hands-free rightsizing experience.
After a short period, Outbrain was able to easily apply the best-suited ScaleOps policy, which was implemented as the default for all new workloads. This provided a zero-touch onboarding process for existing and new workloads and ensured each deployment was optimally provisioned according to its unique requirements. This included adjusting in real time to seasonality and changing traffic patterns.
By leveraging ScaleOps, Outbrain achieved accurate resource allocation for its running pods in its production environment, reducing the burden on engineers and maintaining efficiency and reliability across its diverse services, even amidst fluctuating resource demands and stringent Service Level Objectives (SLOs).
The Impact
Implementing ScaleOps at Outbrain has a significant positive impact on resource management and overall service efficiency. Here’s how:
Efficient Resource Allocation & Cost Savings
Optimized Resource Requests: ScaleOps automatically rightsized workload pods based on workload characteristics and dynamic usage patterns, ensuring that CPU and memory requests were accurately aligned with actual usage patterns. This minimized over-provisioning, leading to cost savings.
Efficient Utilization: Seasonality of resource utilization is better managed, with ScaleOps adjusting resources dynamically to match demand fluctuations, further optimizing cost efficiency.
Increased Efficiency and Reliability
Consistent Performance: With accurate pod rightsizing, Outbrain can maintain consistent service quality even during peak demand periods. This reliability is crucial for meeting the diverse SLOs of different services.
Developer Productivity and Focus
Hands-Free Rightsizing: Engineers no longer need to manually adjust resource requests, freeing them to focus on developing new features and improving existing services. ScaleOps handles the complexities of resource management, reducing the cognitive load on developers.
Culture and Mindset
Building Trust Among Engineers: Engineers have observed that their workloads remain healthy even after the resource management responsibilities have been shifted to the Cloud Platform team and ScaleOps. This has built trust in the platform, as they see firsthand the effectiveness of automated resource allocation.
Impact-Oriented Thinking: The traditional approach involved engineers focusing on applicative errors and predefined alerts for their deployments, which were often treated as absolute truths. However, in the Azure environment, engineers are being educated to think in terms of business impact. For instance, when errors or higher latencies occur, the conversation now revolves around their impact on revenue.
Cost-Effective Scaling: Engineers are now more disciplined in their approach to scaling resources. They consider the financial implications of scaling decisions, understanding that scaling up too late might impact revenue by a certain amount while scaling up too early could significantly increase costs that far surpasses the gain in revenue. This cost awareness ensures that scaling decisions are made judiciously, balancing performance and cost.
Overall, ScaleOps helped transform how Outbrain manages its Kubernetes environment, driving significant improvements in efficiency, cost savings, and service reliability.