AI Infra Engineer
About The Position
About ScaleOps
ScaleOps is the leader in real-time automated cloud resource management, helping enterprises revolutionize the way they manage cloud-native application infrastructures. Our platform dynamically allocates application resources, delivering up to 80% in cloud cost savings while boosting performance and eliminating manual intervention. With $80M+ in backing and over 50 global enterprise customers (including Wiz, SentinelOne, Orca Security, and Playtika), ScaleOps is scaling fast – and now, we are expanding into AI Infrastructure.
We are looking for an AI Infra Engineer to design, build, and optimize the infrastructure that powers large-scale AI/ML workloads in the cloud. This role is highly technical and hands-on, ideal for engineers with deep expertise in distributed systems, GPUs, and ML infra who want to shape the future of AI performance and cost efficiency.
What You’ll Do
- Design and implement infrastructure for training and serving large-scale AI/ML models.
- Build systems for GPU/accelerator scheduling, orchestration, and optimization on Kubernetes-based environments.
- Develop solutions for efficient resource management, workload scheduling, and cost optimization in AI workloads.
- Collaborate with backend and product teams to integrate AI infrastructure into ScaleOps’s core platform.
- Evaluate and adopt cutting-edge technologies in AI infra (distributed training, inference acceleration, multi-cloud GPU scaling).
- Ensure reliability, scalability, and performance of AI/ML pipelines in production.
Requirements
What You’ll Bring
- 5+ years of hands-on software development experience, with a strong focus on infrastructure, distributed systems, or performance engineering.
- Proven hands-on experience in GPU orchestration and distributed training at scale.
- Strong expertise with Kubernetes, Docker, and multi-cloud environments.
- Experience with ML frameworks (PyTorch, TensorFlow, JAX) and large-scale training/inference pipelines.
- Solid programming skills (Python, Go, Bash).
- Previous experience in a company whose core product is in the AI domain
- Familiarity with cloud cost optimization for compute- and GPU-intensive workloads – an advantage.
- Strong ownership, problem-solving ability, and passion for building infra that powers AI.
- Full professional fluency in English (Hebrew – advantage).