Search by job, company or skills

Argyll Scott

AI Site Reliability Engineer- Contract

3-8 Years
Save
new job description bg glownew job description bg glow
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AI Site Reliability Engineer | Singapore | 1-Year Contract

Our client, a leading consulting services organisation, is hiring an AI SRE to own the reliability of their AI platform — from the ground up.

The Role

You'll embed SLOs, observability, deployment safety, and incident response into AI platform services as they're built. Own the enterprise AI gateway (LLM + MCP), set reliability standards across all AI products, and partner with platform engineering and security to ensure everything ships production-ready.

What We're Looking For

  • 3–8 years in SRE or software engineering
  • Deep SRE expertise — SLOs, error budgets, chaos engineering, incident management
  • Experience owning a critical gateway or high-throughput API at ≥99.9% availability
  • Hands-on with AI/ML in production — LLM workloads, agent loops, provider outages
  • AWS/Kubernetes, Terraform/CDK, CI/CD pipelines
  • Observability tools — Datadog, Grafana, OpenTelemetry or equivalent
  • Python proficiency

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147382023