We're building production conversational AI — LLM agents that hold real conversations at scale, answer from a grounded knowledge base, and know their limits. We're looking for a hands-on Lead AI Engineer to own it end to end: the AI behavior, the infrastructure, and the full stack around it.
We're a small, focused team tackling a big problem — making AI that's accurate, grounded, and trustworthy in production. That means high ownership, fast decisions, and direct impact: what you build ships and runs live.
This is a builder-leader role. You write code, make the architecture calls, and set the technical direction for the team.
What you'll do
- Own the AI system — LLM agents, retrieval-augmented generation (RAG), and guardrails. Drive answer quality, retrieval relevance, and safe behavior.
- Debug and improve AI behavior — diagnose why a model hallucinated, mis-routed, or responded incorrectly, and design evaluations (LLM-as-judge and others) to measure and prevent it. Turn vague it answered badly reports into measured fixes.
- Lead platform engineering — design and ship the services, data pipelines, and tooling that let the product scale reliably.
- Own infrastructure as code — Terraform across multiple environments, CI/CD, and a server-less cloud footprint.
- Build full-stack — backend services, an internal web app, and data-processing pipelines.
- Own security and data protection — treat it as first-class: data isolation, least-privilege access, encryption, careful handling of credentials and sensitive user data. Security is a core requirement of everything we ship, not an afterthought.
- Set technical direction — review designs and code, define quality bars, and keep production healthy.
Must have
- You've shipped an LLM application to production — agents and/or retrieval-augmented generation (RAG), with guardrails and a vector/embeddings layer. Not just prototypes.
- Strong prompt engineering and AI debugging — you can reason about model behavior and build evaluations to measure it.
- Terraform / infrastructure as code and solid AWS depth (serverless compute, NoSQL, event-driven flows, IAM).
- Full-stack engineering — TypeScript/Node, a modern web framework (React/Next.js), and Python for data work.
- Strong security and data-protection fundamentals — you build systems that handle sensitive user data safely: data isolation, least-privilege access, encryption, and secrets management.
- Ownership and technical leadership — able to lead a project and keep a production system reliable.
Nice to have
- Experience with Amazon Bedrock (agents, knowledge bases, guardrails) — a strong advantage.
- SaaS / multi-tenant platform design.
- Production observability and cost optimization for AI workloads.
What success looks like
- The AI answers more questions correctly and grounded, and stays within its limits — measured, not guessed.
- The platform scales smoothly, with safe and repeatable infrastructure changes.
#LI-NL1