
Search by job, company or skills
We're looking for an AI-Engineer within Distributed LLM Training & Infrastructure to work on large-scale model training infrastructure across distributed GPU environments. This role focuses on improving how LLMs are trained at scale optimising performance, cost, and efficiency across multi-node systems.
The Role
What You'll Work On
Ideal Background
Why This Role
If you're interested, feel free to apply or reach out directly for a confidential discussion.
Only shortlisted candidates will be contacted.
Job ID: 147286059
Skills:
Jax, Pytorch, DeepSpeed, model parallelism, NCCL communication patterns, Benchmarking, multi-node GPU systems, FSDP, tensor pipeline parallelism
Skills:
Data Analytics, Uipath, Predictive Analytics, Rpa, Power Automate, Sql, Tensorflow, Nlp, Git, Computer Vision, Pytorch, Python, low-code platforms, scikit-learn, Ai, Azure Cognitive Services
Skills:
Tensorflow, Nltk, Git, Rest API Development, Pytorch, Flask, FastAPI, Python, Word parsing, embeddings, Doc AI, Prompt engineering, LLMs, Hugging Face, OCR integration, semantic similarity, NLP GenAI, spaCy, unstructured datasets, CI CD exposure, Vertex AI, structured datasets, Text preprocessing
Skills:
Pytorch, Natural Language Processing, Java, Tensorflow, Python, Machine Learning Algorithms, deep learning frameworks, large language model fine-tuning, anomaly detection, Go, generative AI, time series forecasting
We don’t charge any money for job offers