Job Summary
We are looking for a skilled AI Engineer with 3+ years of experience to assist implementation of AI solutions. In this role, you will be responsible for the end-to-end lifecycle of LLM-based applications from configuring high-performance inference engines like vLLM to architecting advanced Agentic AI workflows. You will bridge the gap between raw model capabilities and project-specific business logic using RAG and CAG patterns.
Key Responsibilities
- Configure and optimize vLLM and other inference frameworks to ensure low-latency, high-throughput model serving.
- Design and implement RAG pipelines using vector databases and CAG strategies to minimize redundant computation.
- Deploy and tune vLLM clusters to provide high-throughput, low-latency API endpoints for various open-source LLMs.
- Design and maintain Apache Airflow DAGs/ RAGFlow to automate the end-to-end AI lifecycle, including data ingestion, automated evaluation, and prompt versioning.
- Develop and version-control sophisticated system prompts, employing techniques like Chain-of-Thought (CoT) to improve reasoning.
- Implement CAG strategies to optimize KV cache reuse and reduce compute costs for long-context project tasks.
- Author and refine system prompts using Agentic techniques to ensure consistent performance across different LLM backends.
Requirement
- Bachelors degree in information technology, Computer Science, Finance, or related field.
- Minimum 3+ years of experience with LLMs hands-on expertise with vLLM and model quantization (AWQ/GPTQ).
- Strong proficiency in Apache Airflow for scheduling complex data and AI pipelines.
- Experience with RAGFlow (or similar deep-document RAG frameworks) and vector databases.
- Experience to build multi-agent systems that use tools and external APIs to complete multi-step tasks.
- Advanced Python, Docker, and Kubernetes
- Experience with AI observability tools to track latency, cost, and hallucination rates.
EA Number: 11C4879