Search by job, company or skills

TEKsystems

Site Reliability Engineer

Early Applicant
  • Posted 11 days ago
  • Be among the first 10 applicants
3-5 Years

Job Description

  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks for common issues and workflows.

A leading global gaming and technology company is seeking a highly capable Site Reliability Engineer (SRE) to join their team in Singapore. This is a mission-critical role where you'll own the reliability, scalability, and performance of complex distributed systems supporting a global platform. You'll work at the intersection of software development and operationsdesigning robust systems, responding to live incidents, and driving automation across infrastructure and CI/CD processes.

The Position

  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks for common issues and workflows.
  • Collaborate closely with developers to streamline production releases, patches, and deployment workflows.
  • Manage infrastructure across cloud environments (primarily AWS), and optimize CI/CD pipelines for reliability and efficiency.
  • Handle capacity planning, system performance tuning, and implement infrastructure-as-code using tools like Terraform.

The Candidate

  • Comes from a backend or full-stack development background and is comfortable coding in languages such as Java, JavaScript/TypeScript, or Bash.
  • Has experience running services at scale in cloud environments like AWS, with a strong understanding of Linux.
  • Thinks like a software engineer, but with the mindset of an operatorproactively preventing outages and continuously improving systems.
  • Is adept at debugging under pressure, analyzing logs/metrics, and communicating clearly during incidents.
  • Is passionate about automation, observability, and creating self-healing systems.

Preferred Qualifications

  • 3+ years of experience in site reliability engineering, DevOps, or software engineering roles.
  • Proven skills in:
  • Monitoring & alerting tools (Grafana, New Relic)
  • CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.)
  • Container orchestration (Docker, Kubernetes)
  • Infrastructure-as-code (Terraform, CloudFormation, Ansible)
  • Managing and securing AWS environments
  • Understanding of authentication/authorization protocols (OAuth, JWT, OpenID)
  • Familiarity with SQL/NoSQL databases (PostgreSQL, Redis, MongoDB)
  • Strong interpersonal skills and a collaborative approach to working with cross-functional teams.

We regret to inform that only shortlisted candidates will be notified / contacted.

EA Registration No: R22105541, TAY ZHIHENG, DARIUS

Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544

More Info

Date Posted: 19/09/2025

Job ID: 126550229

Report Job

About Company

View More
Last Updated: 29-09-2025 11:53:48 PM
Home Jobs in Singapore Site Reliability Engineer