Search by job, company or skills

Allegis Group Singapore Pte Ltd

Site Reliability Engineer

7-10 Years
SGD 10,000 - 15,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 40 applicants
Early Applicant
Quick Apply

Job Description

OVERVIEW

We're hiring a Site Reliability Engineer to support a key global technology client. You'll join a modern, cloudnative engineering environment and partner closely with development teams to improve the reliability, scalability, and automation of distributed platforms. The role blends software engineering with reliability ownership: you'll design and build internal services and tooling, streamline CI/CD, implement InfrastructureasCode at scale, and strengthen observability so issues are found and fixed before they impact users.

This position offers high autonomy and visibility. You'll work across welldocumented systems and established tooling, prepare proofofconcepts to influence change, and drive pragmatic automation (in Go or Python) that reduces manual effort and makes releases safer and faster. If you enjoy handson engineering, diagnosing complex problems, and landing improvements in real production environments, this is an opportunity to make a clear and measurable impact.

DESCRIPTION

As a Site Reliability Engineer, you will:

  • Build internal platforms, services, and APIs that enable selfservice provisioning, safe deployments, and efficient daytoday operations.
  • Enhance CI/CD workflows (e.g., Jenkins or similar) to increase deployment reliability, add guardrails, and improve developer experience and velocity.
  • Implement and evolve InfrastructureasCode using Terraform (and related patterns) to standardize environments, reduce configuration drift, and improve repeatability.
  • Define and operationalize SLIs/SLOs and error budgets, build actionable dashboards, and tune alerts to reflect user experience and business risk.
  • Operate Kubernetes workloads at scale; improve resilience, performance, and costefficiency through sound engineering and automation.
  • Strengthen observability (metrics, logs, traces) using Prometheus and complementary platforms; drive rootcause analysis and preventative fixes.
  • Automate routine work and periodic upgrade cycles (preferably in Go/Python) to eliminate toil and reduce change risk.
  • Troubleshoot complex incidents across compute, networking, containers, and deployments; participate in a shared oncall rotation and contribute to postincident reviews.
  • Collaborate with engineers, architects, and product stakeholders to translate requirements into secure, observable, and scalable infrastructure solutions.
  • Document patterns and best practices; mentor teams on reliabilityfirst ways of working and platform standards.

QUALIFICATIONS

  • Strong handson experience with AWS (production environments) and cloudnative architectures; familiarity with hybrid or multicloud concepts is a plus.
  • Practical expertise operating Kubernetes (deployments, day2 operations, and troubleshooting).
  • Solid CI/CD skills with Jenkins or similar tools (pipeline design, release safety, rollbacks).
  • Proficiency in InfrastructureasCode (Terraform) and Gitbased workflows for environment management.
  • Programming/automation in Go and/or Python (productionquality code; tooling and services, not just scripts).
  • Observability experience with Prometheus and dashboards/alerting tuned to SLIs/SLOs; familiarity with platforms such as Grafana, Datadog, or CloudWatch is welcome.
  • Networking fundamentals for distributed systems, DNS, load balancing, VPC design, security groups, and layer7 routing/proxies.
  • Sound understanding of secure system design (least privilege, secrets management, change control) and performance/reliability tradeoffs.
  • Excellent communication skills and the ability to operate independently in distributed, asynchronous teams while influencing stakeholders through clear proposals and POCs.
  • 7+ years in SRE/DevOps/Infrastructure/Software Engineering with a track record of operating productiongrade systems at scale.

PROFESSIONAL ATTRIBUTES

  • Ownership: You're accountable across both build and run; you close the loop with measurable outcomes.
  • Automation first: You remove toil with durable solutions, not quick fixes.
  • Engineering rigor: You apply design patterns, testing, and code reviews to platform work.
  • Influence without authority: You use documentation, POCs, and calm communication to align teams.
  • Proactive and visible: You work independently across time zones and keep stakeholders informed.

We regret to inform that only shortlisted candidates will be notified / contacted.

EA Registration No: R21103843, Andrew Jonas Matthew

Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544

More Info

Job Type:
Function:

About Company

Allegis Group is the global leader in talent solutions focused on working harder and caring more than any other provider. We'll go further to understand the needs of our people; our clients, our candidates, and our employees; and to consistently deliver on our promise of an unsurpassed quality experience. That's the Allegis Group difference, and it's consistent across every Allegis Group company. With more than US$11billion in annual revenues and over 500 locations across the globe, our network provides businesses with a comprehensive suite of talent solutions; without sacrificing the niche expertise required to ensure a successful partnership. Our specialised group of companies includes: Aerotek, TEKsystems, Allegis Global Solutions, Aston Carter, Major, Lindsey; Africa, Allegis Partners, MarketSource, and EASi. Visit www.AllegisGroup.com to learn more.

Allegis Group Singapore Pte Ltd,
Company Reg No. 200909448N, EA Licence No. 10C4544

Job ID: 143730709