Job Summary:
We are seeking a Machine Learning Platform Engineer (MLOps) to bridge the gap between data science and production systems within our ML & AI platform team. The role spans the full ML lifecycle-from development to deployment-focusing on building robust data pipelines, managing model lifecycles, and enabling scalable AI integrations. You will also drive agentic workflows to support autonomous, AI-powered solutions in collaboration with cross-functional teams.
Job Responsibilities:
- Design, develop and deploy machine learning solutions and services
- Implement end-to-end machine learning pipelines from data ingestion to training and model serving
- Operationalize LLMs, embeddings, and multi-agent systems in real-world applications
- Manage the machine learning and model lifecycle (experimentation, registry, deployment)
- Oversee the model promotion lifecycle, coordinating validation gates and approval workflows to safely deploy new model versions from stating to production
- Containerize applications using Docker and orchestrate them via Kubernetes
- Build and maintain CI/CD pipelines for ML models and LLM applications
- Collaborate with data scientists to refactor research code into production-ready Python code
- Monitor model performance, data drift, and performance in production
- Assess and integrate AI solutions ensuring optimal performance and reliability
- Design and implement production grade RAG systems
- Collaborate with infrastructure teams, data engineers, data scientists, and other stakeholders to integrate machine learning solutions into existing systems and processes
- Participate in code reviews, testing, and debugging to ensure the quality and reliability of machine learning solutions
Job Requirements:
- Bachelor's or Master's degree in Data Science, Computer Science, Mathematics, Statistics, or a related field
- Advanced proficiency in Python programming with a focus on writing clean, testable and efficient code
- DevOps & Containers: Proficient with Docker for containerization and working knowledge of Kubernetes (k8s) for orchestration
- Practical understanding of GPU architecture and cloud compute instances to optimize resource allocation for training and inference workloads
- MLOPS tools: hands on experience with MLflow (or similar tools like weights & biases) for experiment tracking and model registry
- Proven experience working with Large Language Models (LLMs)
- Good understanding of AI agents & agentic workflows, LLM orchestration frameworks and reasoning patterns
- Experience with data preprocessing, feature engineering, and model selection and evaluation techniques
- Hands-on experience with CI/CD pipelines (GitLab, Jenkins)
- Knowledge of statistical and mathematical concepts relevant to machine learning, such as probability, linear algebra, and optimization
- Relevant work experience in machine learning, data science or a related field