Job Responsibilities
- Design, develop and deploy machine learning solutions and services
- Implement end-to-end machine learning pipelines from data ingestion to training and model serving
- Operationalize LLMs, embeddings, and multi-agent systems in real-world applications
- Manage the machine learning and model lifecycle (experimentation, registry, deployment)
- Oversee the model promotion lifecycle, coordinating validation gates and approval workflows to safely deploy new model versions from stating to production
- Containerize applications using Docker and orchestrate them via Kubernetes
- Build and maintain CI/CD pipelines for ML models and LLM applications
- Collaborate with data scientists to refactor research code into production-ready Python code
- Monitor model performance, data drift, and performance in production
- Assess and integrate AI solutions ensuring optimal performance and reliability
- Design and implement production grade RAG systems
- Collaborate with infrastructure teams, data engineers, data scientists, and other stakeholders to integrate machine learning solutions into existing systems and processes
- Participate in code reviews, testing, and debugging to ensure the quality and reliability of machine learning solutions
SKILLS REQUIREMENTS OF THEPOSITION
Competencies
- Strong problem-solving and analytical skills, with the ability to think critically and creatively about complex challenges
- Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams and stakeholders at all levels of the organization
- Ability to manage personal workloads effectively, to prioritize tasks, manage timelines, and deliver high-quality results on schedule
- Continuous learning mindset, with a passion for staying up to date with the latest advancements in machine learning and artificial intelligence
- Attention to detail and commitment to producing high-quality, reliable, and maintainable code
Education and skills requirements
- Bachelor's or Master's degree in Data Science, Computer Science, Mathematics, Statistics, or a related field
- Advanced proficiency in Python programming with a focus on writing clean, testable and efficient code
- DevOps & Containers: Proficient with Docker for containerization and working knowledge of Kubernetes (k8s) for orchestration
- Practical understanding of GPU architecture and cloud compute instances to optimize resource allocation for training and inference workloads
- MLOPS tools: hands on experience with MLflow (or similar tools like weights & biases) for experiment tracking and model registry
- Proven experience working with Large Language Models (LLMs)
- Good understanding of AI agents & agentic workflows, LLM orchestration frameworks and reasoning patterns
- Experience with data preprocessing, feature engineering, and model selection and evaluation techniques
- Hands-on experience with CI/CD pipelines (GitLab, Jenkins)
- Knowledge of statistical and mathematical concepts relevant to machine learning, such as probability, linear algebra, and optimization
- Excellent problem-solving and debugging skills, with the ability to identify and resolve issues quickly and effectively
- Relevant work experience in machine learning, data science or a related field