About the job
We're looking for a Reinforcement Learning (RL) Engineer to develop and deploy learning-based control policies for our robots, including integration with Vision-Language-Action (VLA) stacks. You will own the training loop from simulation and logged data through evaluation on hardware, working closely with simulation, perception, and robotics teams.
This is not a research-only role. You will ship policies that must work under real operational constraints—latency, safety, embodiment differences, and continuous improvement from field data.
What you'll do
- Design, implement, and maintain RL training pipelines for robotic manipulation, navigation, and whole-body control tasks
- Develop and tune policies in simulation and on real hardware, with clear benchmarks for success, robustness, and regression detection
- Integrate RL stacks with VLA and broader autonomy systems: action spaces, planners, low-level controllers, and deployment interfaces
- Build reward design, curriculum learning, and domain randomization strategies that improve sim-to-real transfer
- Own dataset and experience pipelines (sim rollouts, teleoperation logs, filtered trajectories) for offline RL, imitation, and hybrid training
- Implement evaluation harnesses in sim and on physical robots; analyze failure modes and drive iterative improvements
- Collaborate with simulation engineers on environments, assets, and synthetic data needed for scalable training
- Work with software and embedded teams on inference deployment, monitoring, and safe rollout of new policy versions
- Document experiments, model checkpoints, and deployment procedures so the team can reproduce and extend your work
What we're looking for
- Degree in Robotics, Computer Science, Electrical Engineering, Machine Learning, or related field (or equivalent industry experience)
- Strong track record in reinforcement learning for control, robotics, or embodied AI (published work or shipped systems)
- Proficiency in Python and deep learning frameworks (PyTorch preferred; JAX or similar acceptable)
- Experience training policies in physics simulators (e.g. Isaac Lab / Isaac Sim, MuJoCo, Brax, or Gazebo-based stacks)
- Solid understanding of MDP formulation, policy gradients, actor-critic methods, and practical RL engineering (stability, hyperparameters, logging)
- Familiarity with robot kinematics, dynamics, and common control interfaces (position, velocity, torque; whole-body vs arm-only)
- Comfort debugging end-to-end: from training curves and sim artifacts to real-robot execution and safety limits
- High agency, clear experimentation discipline, and ability to work across ML and robotics disciplines
Nice to have
- Experience with Vision-Language-Action models, behavior cloning, or offline RL from multimodal robot datasets
- Exposure to cross-embodiment training, sim-to-real, or fleet-scale policy deployment
- Familiarity with ROS / ROS 2, MoveIt, or motion planning integration for learned policies
- Experience with teleoperation data, LeRobot-style pipelines, or large-scale log ingestion for learning
- Knowledge of model compression, ONNX export, edge inference, or real-time policy serving on robot compute
- Background in manipulation, mobile manipulation, dexterous hands, or contact-rich tasks
Who you are
- You want learning systems that survive contact with the real world, not just leaderboard scores in sim
- You are rigorous about evaluation, reproducibility, and knowing when a policy is ready to ship
- You are ambitious, collaborative, and comfortable owning the full loop from idea to deployed behavior
- You care about how RL and VLAs compound into long-term product and fleet advantage
- You want to help define how intelligent machines improve through data, simulation, and deployment
What we are looking to build
Production-oriented RL and VLA-adjacent training stacks: policies and integration layers that bridge high-level reasoning with reliable low-level control across our robot embodiments, with validated sim benchmarks and documented paths to safe real-world rollout.