AI Infra Engineer

Israel · Full-time · Senior

About The Position

About ScaleOps

ScaleOps is the leader in real-time automated cloud resource management, helping enterprises revolutionize the way they manage cloud-native application infrastructures. Our platform dynamically allocates application resources, delivering up to 80% in cloud cost savings while boosting performance and eliminating manual intervention. With $80M+ in backing and over 50 global enterprise customers (including Wiz, SentinelOne, Orca Security, and Playtika), ScaleOps is scaling fast – and now, we are expanding into AI Infrastructure.

We are looking for an AI Infra Engineer to design, build, and optimize the infrastructure that powers large-scale AI/ML workloads in the cloud. This role is highly technical and hands-on, ideal for engineers with deep expertise in distributed systems, GPUs, and ML infra who want to shape the future of AI performance and cost efficiency.


What You’ll Do

  • Design and implement infrastructure for training and serving large-scale AI/ML models.
  • Build systems for GPU/accelerator scheduling, orchestration, and optimization on Kubernetes-based environments.
  • Develop solutions for efficient resource management, workload scheduling, and cost optimization in AI workloads.
  • Collaborate with backend and product teams to integrate AI infrastructure into ScaleOps’s core platform.
  • Evaluate and adopt cutting-edge technologies in AI infra (distributed training, inference acceleration, multi-cloud GPU scaling).
  • Ensure reliability, scalability, and performance of AI/ML pipelines in production.

Requirements

What You’ll Bring

  • 5+ years of hands-on software development experience, with a strong focus on infrastructure, distributed systems, or performance engineering.
  • Proven hands-on experience in GPU orchestration and distributed training at scale.
  • Strong expertise with Kubernetes, Docker, and multi-cloud environments.
  • Experience with ML frameworks (PyTorch, TensorFlow, JAX) and large-scale training/inference pipelines.
  • Solid programming skills (Python, Go, Bash).
  • Previous experience in a company whose core product is in the AI domain
  • Familiarity with cloud cost optimization for compute- and GPU-intensive workloads – an advantage.
  • Strong ownership, problem-solving ability, and passion for building infra that powers AI.
  • Full professional fluency in English (Hebrew – advantage).


Apply for this position

Start Optimizing K8s Resources in Minutes!

Schedule your demo

Submit the form and schedule your 1:1 demo with a ScaleOps platform expert.

Schedule your demo