LLM / AI Quality Engineer

QUESS SELECTION & SERVICES PTE. LTD.

Singapore, Robinson

3-5 Years

SGD 9,000 - 10,000 per month

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description:

As a LLM / AI Quality Engineer, lead the end-to-end evaluation of AI applications

- LLM features, RAG systems, and multi-agent workflows

- To ensure they meet business outcomes, safety requirements, and platform standards. Own test design, execution, and reporting across offline, pre-prod, and in-prod stages, integrating with CI/CD and working closely with product, data, and platform teams.

1) AI/LLM Evaluation & Test Design - Define evaluation strategies (golden sets, adversarial suites, regressions), pass/fail gates, and SLOs for quality, safety, latency, and cost. - Establish rubric-based human reviews (usefulness, faithfulness, safety, clarity) and calibrate annotators. - Instrument LLM-as-judge where appropriate with calibration and spot checks.

2) RAG, Retrieval, & Grounding - Measure retrieval precision/recall, MRR/nDCG, and answer faithfulness to sources detect hallucination and citation errors. - Test chunking, prompt templates, filters, and policy chains monitor stale/poisoned content.

3) Agentic & Tool-Use Scenarios - Validate multi-step plans, tool selection, error recovery, retries, and idempotency for functions with side effects. - Contract-test JSON schemas and structured outputs across services.

4) Non-Functional, Performance & Cost - Run token-aware load/soak tests (context length, temperature, batching) track p50/p95/p99, throughput, timeouts, cache hit rate, and cost per successful task. - Recommend optimizations (prompt/policy changes, retrieval tweaks, caching).

5) Security, Privacy & Safety - Red-team for prompt injection, data exfiltration, indirect injections via retrieved content validate guardrails pre/post inference. - Enforce PII controls, data-residency, and compliance checks align with organizational security testing practices.

6) Observability & CI/CD Integration - Implement prompt/dataset/version lineage and trace-based evals automate in CI (pre-merge golden tests, nightly adversarials) with canary/A-B in prod and rollback criteria. - Produce clear, decision-ready reports with risk assessments and release recommendations.

7) Project Delivery & Collaboration - Analyze requirements, enhance test plans with additional cases, prepare environments (including cloud), execute tests per plan, and drive defect resolution. - Provide regular status updates manage test activities to schedule support SIT/UAT and production readiness.

8) Performance, API & Platform Testing (Carry-over) - Execute API, performance, and load testing for microservices/web services that underpin AI features integrate automated testing into CI/CD.

9) Team & Standards - Adopt and improve test standards/methodology share practices, train teams, participate in peer reviews, and pursue self-directed learning.

Qualifications

The ideal candidate should possess:

- 3+ years in software testing/QA with strong test methodology and tooling hands-on API testing and performance testing.

- Programming familiarity (e.g., Python/TypeScript) and experience with CI/CD and version control.

- Cloud basics (AWS/Azure/GCP) and microservices fundamentals.

- Degree/Diploma in CS/IT or equivalent.

- Preferred (AI/ML Focus)

- Understanding of ML concepts and MLOps experience with model validation and monitoring in production.

- Experience with AI-specific security testing and vulnerability assessment.

- Familiarity with evaluation/observability tools (any of): LangSmith, Weights & Biases, RAGAS, TruLens, Promptfoo, DeepEval, Guardrails/LlamaGuard, Presidio plus OpenTelemetry-style LLM traces.

- Practical exposure to Azure OpenAI/Bedrock/Vertex and model gateways quota & token accounting know-how.

Tooling & Automation

- Modern automation frameworks (e.g., Playwright, Cypress, Selenium), API test tools (Postman/REST Assured), performance tools (k6/JMeter), and CI/CD integration.

- Data evaluation pipelines for RAG (embedding validation, filtering, drift detection).

- Traits

- Outcome-oriented, high standards strong communication and collaboration customer-focused proficient in written and spoken English.

- Telco Context (Nice-to-Have)

- Experience testing copilots/agents for BSS/OSS, NOC analytics, and enterprise care ability to tie eval KPIs to CSAT, AHT, FCR, MTTR.

Additional Information

- Lead high-impact Data & AI advisory programs for major enterprises and public sector clients.

- Shape enterprise strategies and governance frameworks that drive real transformation.

- Work with a talented, multidisciplinary team in a collaborative environment.

- Competitive compensation and strong professional development support.