Outbrain/ScaleOps – Case Study
Outbrain
850 employees
Advertisement Technology
AKS
Tel Aviv & New York
About OutBrain
Outbrain, a leading content discovery platform, empowers publishers and marketers to reach their audiences with personalized, engaging content. Founded with the mission to make the internet a better place, Outbrain’s innovative technology delivers relevant recommendations that drive audience engagement and revenue growth. Trusted by top-tier publishers and utilized by millions of users worldwide, Outbrain stands out for its commitment to quality and performance. Serving over 10 billion recommendations daily, the company partners with premier brands and is recognized as a pioneer in the content marketing industry.
Key Results
Cost Savings
ScaleOps played a crucial role in Outbrain’s broader cost-efficiency efforts and effectively managed to reduce costs across various workloads.
Efficient Resource Allocation
With ScaleOps, Outbrain is able to optimize the process of resource allocation to ensure cost savings while not compromising on performance.
Increased Efficiency and Reliability
Accurate pod rightsizing keeps Outbrain reliable and SLO-compliant even at peak demand.
Boost Developer Productivity
Hands-free rightsizing with ScaleOps lets the engineering teams at Outbrain focus on innovation instead of resource management.
The Challenge
Ongoing Optimization Effort
Running Outbrains production workloads effectively on AKS requires ongoing optimization to prevent charges from wasted resources and achieve immediate cost savings through efficient resource management.
Inefficient Resource Management
When there is an applicative problem, developers usually add resources, whether this will resolve the problem or not. Rarely, or never, do developers revisit healthy deployments to see if the requested resources can be reduced.
Seasonality
Outbrain’s workloads are seasonal by nature, impacting production traffic and resource utilization on an ongoing basis.
Diverse Workloads
Various types of workloads with different characteristics, including Java-based workloads which are always a challenge in a containerized environment.
Managing CPU and memory requests at Outbrain presents several challenges, particularly when developers specify resource requests inaccurately. This inefficiency in resource allocation leads to wasted resources and resource contention. The issue is exacerbated by the seasonality of resource utilization and by Outbrain’s traffic patterns, where Outbrain’s services experience fluctuating demands.Â
During peak times, inadequate resource requests can cause performance issues or outages, while overestimating needs will drive up costs unnecessarily. The constant challenge lies in getting engineers to act and revise their resource requests to match actual usage patterns, ensuring efficiency and reliability.Â
Additionally, Outbrain’s diverse services, each with distinct Service Level Objectives (SLOs), add another layer of complexity to resource management.Â
Java-based workloads further complicate matters with their unpredictable memory usage and potential lack of container optimization. Effective resource management at Outbrain requires continuous education and collaboration with developers, proactive monitoring, and strategic adjustments to maintain consistent service quality across diverse workloads.
The Solution
Full Automation
ScaleOps is used to fully automate CPU and Memory requests on all of Outbrain’s AKS environments, including the automation of the Production environment.
ScaleOps Policy Efficiency
Automatic default Outbrain ScaleOps policy applied across all workloads.
The Java Use Case
ScaleOps supports Outbrain’s use case with JVM sensitive recommendations and automation (XMX).Â
Out-Of-The-Box Solution
All new workloads have a default ScaleOps policy automatically attached and are automated programmatically.
Business Efficiency
 Engineers cannot change the policy or to switch off the automation. They must explain their requirements in terms of impact, revenue, and cost. More than 90% of Outbrain’s workloads are automated.
To address these challenges, Outbrain implemented ScaleOps on their AKS production clusters to provide a robust solution for ongoing optimization and automation of resource requests, offering engineering teams a hands-free rightsizing experience.Â
After a short period, Outbrain was able to easily apply the best-suited ScaleOps policy, which was implemented as the default for all new workloads. This provided a zero-touch onboarding process for existing and new workloads and ensured each deployment was optimally provisioned according to its unique requirements. This included adjusting in real time to seasonality and changing traffic patterns.
By leveraging ScaleOps, Outbrain achieved accurate resource allocation for its running pods in its production environment, reducing the burden on engineers and maintaining efficiency and reliability across its diverse services, even amidst fluctuating resource demands and stringent Service Level Objectives (SLOs).
The Impact
Implementing ScaleOps at Outbrain has a significant positive impact on resource management and overall service efficiency. Here’s how:
Efficient Resource Allocation & Cost Savings
- Optimized Resource Requests: ScaleOps automatically rightsized workload pods based on workload characteristics and dynamic usage patterns, ensuring that CPU and memory requests were accurately aligned with actual usage patterns. This minimized over-provisioning, leading to cost savings.
- Efficient Utilization: Seasonality of resource utilization is better managed, with ScaleOps adjusting resources dynamically to match demand fluctuations, further optimizing cost efficiency.
Increased Efficiency and Reliability
- Consistent Performance: With accurate pod rightsizing, Outbrain can maintain consistent service quality even during peak demand periods. This reliability is crucial for meeting the diverse SLOs of different services.
Developer Productivity and Focus
- Hands-Free Rightsizing: Engineers no longer need to manually adjust resource requests, freeing them to focus on developing new features and improving existing services. ScaleOps handles the complexities of resource management, reducing the cognitive load on developers.
Culture and Mindset
- Building Trust Among Engineers: Engineers have observed that their workloads remain healthy even after the resource management responsibilities have been shifted to the Cloud Platform team and ScaleOps. This has built trust in the platform, as they see firsthand the effectiveness of automated resource allocation.
- Impact-Oriented Thinking: The traditional approach involved engineers focusing on applicative errors and predefined alerts for their deployments. These alerts were often treated as absolute truths. However, in the Azure environment, engineers are being educated to think in terms of business impact. For instance, when errors or higher latencies occur, the conversation now revolves around their impact on revenue.
- Cost-Effective Scaling: Engineers are now more disciplined in their approach to scaling resources. They consider the financial implications of scaling decisions, understanding that scaling up too late might impact revenue by a certain amount while scaling up too early could significantly increase costs that far surpasses the gain in revenue. This cost awareness ensures that scaling decisions are made judiciously, balancing performance and cost.
Overall, ScaleOps transforms how Outbrain manages its Kubernetes environment, driving significant improvements in efficiency, cost savings, and service reliability.