1. Architecture & Platform (Frameworks)
- Design an orchestration platform enabling consistent use of LLMs (invocation, routing, retries, fallbacks, caching, batching, cost controls).
- Create reusable libraries/SDKs for prompts, tools (LangChain Tools / SK Functions), memory, evaluators, tracing, and policy enforcement.
- Establish standards for prompt management & versioning, evaluation harnesses, and model lifecycle governance (routing, rollback, deprecation).
- Define reference architectures for common patterns: chat + RAG, structured output, tool-enabled agents, doc intelligence, and workflow automation.
2. Agentic Patterns & Tooling
- Implement robust multi-agent patterns (planner/executor, supervisor/worker, critic/reviewer, self-reflective loops).
- Build tool-use integrations for APIs, databases, web search, code execution, internal services ensure idempotent contracts and safe tool invocation.
- Design long-horizon planning with memory, state persistence, and interruption/resume semantics.
3. Retrieval-Augmented Generation (RAG)
- Architect RAG pipelines with hybrid search, chunking, metadata filtering, re-ranking, and retrieval routing.
- Integrate vector search (e.g., Azure AI Search, Pinecone, Weaviate, FAISS) and document preprocessing (parsing, dedup, PII scrubbing, quality gates).
- Measure and improve factual grounding (hallucination reduction, answerability, coverage
4. Security, Safety & Compliance
Implement guardrails (prompt injection defense, output filtering, jailbreak detection, PII redaction, policy checks).
Enforce RBAC, secrets management, audit logs, and data governance aligned with organizational policies (e.g., PDPA/MAS TRM/GDPR where applicable).
Build human-in-the-loop (HITL) mechanisms for review, escalation, and feedback capture.
5. Reliability, Observability & Cost
- Set SLOs/SLA for latency, reliability, and cost-per-task design token accounting and budget caps.
- Implement tracing/telemetry (OpenTelemetry), structured logs, dashboards (Grafana), and A/B testing for prompts/models/routing.
- Optimize performance & cost (caching, prompt compression, response truncation, adaptive model selection).
6. Delivery & Collaboration
- Partner with product, security, and data teams to deliver production workflows end-to-end.
- Lead design reviews, technical docs, and internal enablement (playbooks, templates, starter kits).
- Mentor engineers uplift engineering practices for LLM apps and orchestration.
Bachelor's in Computer Science, Software/Computer Engineering, Data/AI, Information Systems, or equivalent.
6-10+ years in software or AI engineering 2+ years with LLM apps/orchestration.
Hands-on with LangChain and/or Semantic Kernel building production-grade chains/agents/tools.
Strong Python engineering CI/CD, testing, typed code, performance tuning.
Practical RAG experience and vector databases prompt engineering & structured output (JSON schemas).
Cloud experience (Azure preferred), containerization (Docker/K8s), and secure service integration.
Proven track record shipping reliable, observable, and cost-aware LLM solutions.