Search by job, company or skills

M

Senior Software Engineer, ML Dev Enablement (AI Agents)

5-7 Years
SGD 9,000 - 16,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Role

We are looking for a Senior Software Engineer to join our ML Infrastructure: Dev Enablement team. Our mission is to build a frictionless development environment that empowers our researchers and engineers to innovate on deep learning models for autonomous driving.

We manage a high-scale Cloud Development Environment (CDE) platform that provides standardized, high-performance workspaces for ML development. In this role, your impact will be two-fold:

  1. Platform Ownership: You will act as a key owner of our CDE platform, ensuring its scalability, reliability, and seamless integration into the ML workflow.
  2. Agentic Evolution: You will lead our shift toward Agentic ML Workflows. You won't just be building static tools you'll be architecting AI Agents that act as force multipliers-helping engineers automate debugging, optimize resource usage, and accelerate the journey from code to a trained model.

What You'll Be Doing

  • Scale & Evolve the Dev Platform: Lead engineering efforts to support and enhance our existing CDE platform, ensuring it meets the rigorous demands of large-scale ML experimentation.
  • Architect AI Agents: Design and implement LLM-powered agents capable of navigating the ML lifecycle-from automated code suggestions and log analysis to autonomous debugging of distributed training jobs.
  • Infrastructure Integration: Bridge the gap between AI agents and our core infra, ensuring agents can safely and effectively interact with Kubernetes, Ray, and AWS resources.
  • Collaborative Automation: Partner with ML Engineers to identify productivity killers and build agentic solutions (e.g., an agent that suggests fixes for common PyTorch distributed training errors).
  • Champion Engineering Excellence: Bring software engineering rigor to the wild west of LLM development, including building evaluation frameworks for agent performance, reliability, and security.
  • Mentor & Lead: Act as a subject matter expert on Agentic AI within the infrastructure team, guiding junior engineers and influencing our long-term technical roadmap.

What We're Looking For

  • Experience: 5+ years of professional software engineering experience, with a focus on backend systems, distributed systems, or infrastructure.
  • Agentic AI Proficiency: Hands-on experience building applications with LLM frameworks (e.g., LangChain, LangGraph, or LlamaIndex). You understand how to turn a prompt into a reliable, tool-calling agent.
  • Technical Stack: Expert-level Python or Go. Deep experience with Kubernetes and Cloud is required.
  • Cloud Infrastructure: Proven experience with AWS (or similar)
  • Communication: Ability to translate complex infrastructure challenges into clear technical designs and collaborate across diverse engineering and research teams.

Bonus Points

  • ML Ecosystem: Experience with ML orchestration and training frameworks like Ray or PyTorch.
  • Remote Dev Expertise: Familiarity with Coder or other Cloud Development Environments (CDEs) at scale.
  • Experiences with managing or working with high-performance compute resources (GPUs).

More Info

Job Type:
Industry:
Employment Type:

Job ID: 144116633

Similar Jobs