Okara a private, faster & safer chat interface to access the latest open-source AI models.
What you'll do
- Integrate and serve open-source LLMs; build a fast, reliable chat pipeline.
- Tune prompts/parameters and add guardrails, evals, and fallbacks.
- Build retrieval + memory (RAG) with clean context handling.
- Add research tools and productivity connectors; handle auth/rate limits safely.
- Instrument performance and costs; drive steady latency and reliability wins.
- Ship features end-to-end with product/design; maintain clear APIs and docs.
You're a fit if you
- Have shipped LLM features to production and can debug messy edge cases.
- Understand model behavior (sampling, context windows, function/tool use).
- Know how to measure and reduce latency, errors, and token spend.
- Are comfortable with data stores, queues, and observability basics.
- Care deeply about privacy, security, and responsible data handling.
- Move fast, own outcomes, and communicate crisply.
Nice to have
- Experience with model optimization/quantization and batching.
- Familiarity with agent patterns and retrieval best practices.
- Background in security/permissions for multi-user systems.
Location: Singapore or remote ( 5h SGT)