Senior Backend Engineer, Large Language Models

Zark Lab

Singapore

4-6 Years

Save

Posted 7 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Senior Backend Engineer, Large Language Models

Zark Lab Full-time Remote

About Zark Lab

Zark Lab builds analytical foundation models that deliver real-time intelligence and conversational insights from massive, multi-modal datasets. Our platform processes documents, video, images, and structured data at scale powering enterprise customers across Web3, fintech, and sports. We're a fast-moving team pushing the boundaries of what's possible when you combine advanced AI with large-scale data infrastructure.

The Role

We're looking for a Senior Backend Engineer who will own the full lifecycle of LLM integration across our platform from model deployment and fine-tuning to designing the orchestration layers that tie multiple models together into production-grade systems.

This isn't a research role. You'll be shipping infrastructure that handles real workloads: routing queries to the right model, managing context windows, building skill-based agent architectures, optimizing token usage for cost and latency, and ensuring everything runs reliably at scale.

What You'll Do

Deploy, serve, and maintain LLMs in production environments, managing model versioning, A/B testing, and rollback strategies.

Design and build orchestration layers that coordinate across multiple LLMs (e.g., routing between specialized models based on task type and cost).

Fine-tune foundation models on domain-specific data and evaluate performance against production benchmarks.

Develop prompt engineering frameworks, skill definitions, and model-driven schemas (MDS) that standardize how the platform interacts with LLMs.

Architect agent-based systems with tool use, retrieval-augmented generation (RAG), and multi-step reasoning pipelines.

Build and optimize backend services in Python and Node.js, with performance-critical components in C++.

Write and optimize complex SQL queries against large-scale analytical databases (Snowflake, Databricks, PostgreSQL).

Work with containerized deployments (Docker, Kubernetes) and GPU-accelerated infrastructure for model inference.

Instrument systems for observability latency tracking, token usage monitoring, error rates, and cost attribution per model call.

Collaborate directly with the founding team to shape technical architecture and product direction.

What We're Looking For

4+ years of backend engineering experience, with at least 12 years working directly with LLMs in production.

Strong proficiency in Python and Node.js; working knowledge of C++ is a plus.

Solid SQL skills comfortable writing complex analytical queries and optimizing query performance.

Hands-on experience deploying and serving ML/LLM models (vLLM, TGI, Triton, or similar inference servers).

Familiarity with prompt engineering patterns, function calling, structured output parsing, and agent orchestration frameworks (LangChain, LlamaIndex, or custom).

Experience with containers (Docker, Kubernetes), GPU infrastructure (NVIDIA CUDA, cloud GPU instances), and distributed computing frameworks (Ray, Dask, Spark, or similar).

Understanding of model fine-tuning workflows (LoRA, QLoRA, PEFT) and evaluation methodologies.

Experience with cloud platforms (GCP, AWS) and infrastructure-as-code.

Strong fundamentals in distributed systems, concurrency, and performance optimization.

Bias toward shipping you prototype fast, iterate based on real data, and don't over-engineer.

Nice to Have

Experience with blockchain data, on-chain analytics, or Web3 infrastructure.

Familiarity with ClickHouse or columnar databases for high-volume analytical workloads.

Background in multi-modal AI (video, image, audio processing pipelines).

Contributions to open-source ML/infrastructure projects.

Experience at an early-stage startup.

Why Zark Lab

Work directly with the founders on hard, high-impact problems.