Search by job, company or skills

L

Senior Site Reliability Engineer

7-9 Years
SGD 8,000 - 11,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 4 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

As a Senior Site Reliability Engineer at LANDI Global, you will play a critical role in defining and advancing the reliability, scalability, and performance of our platform infrastructures. You will workclosely with cross-functional teams to establish reliability standards, drive automation strategy, and lead continuous improvement initiatives across our environments.

Infrastructure & Platform Operations

  • Design, build, and optimize LANDI Global's platform infrastructures across development, staging, and production environments, with a focus on scalability and resilience.
  • Collaborate with R&D and platform teams to define architecture patterns and reliability standards that ensure availability and operational excellence.
  • Lead platform readiness for new client onboarding, ensuring scalability, repeatability, and operational sustainability.

Monitoring, Reliability & Incident Management

  • Define and drive improvements in monitoring, logging, and alerting systems to ensure high signal quality and proactive issue detection.
  • Lead incident response for high-severity events, and drive high-quality root cause analysis (RCA) with a focus on systemic improvements.
  • Design, evolve, and validate Disaster Recovery (DR) and business continuity strategies, ensuring systems meet recovery objectives.
  • Participate in and help evolve the 24/7 standby model to improve operational effectiveness and sustainability.

Performance, Optimization & Automation

  • Analyze platform performance metrics and lead optimization strategies across cloud and on-prem environments.
  • Drive improvements in automated testing, CI/CD pipelines, and deployment workflows to enhance release safety, speed, and reliability.
  • Identify and eliminate operational toil through automation and engineering solutions.
  • Establish and standardize operational runbooks and procedures across services.

Operational Support

  • Provide advanced troubleshooting and support for complex production issues, guiding teams toward effective resolution.
  • Lead continuous improvement initiatives to enhance platform resilience, scalability, and operational efficiency.
  • Act as a key escalation point for critical platform issues and reliability concerns.

Technical Leadership & Collaboration

  • Mentor Associate SREs and SREs through guidance, reviews, and knowledge sharing.
  • Influence engineering teams without direct authority to adopt best practices in reliability and operations.
  • Act as a bridge between SRE, platform, and R&D teams to align on scalable and sustainable engineering practices.

REQUIREMENTS& QUALIFICATIONS

  • Bachelor's degree in Computer Science, Software Engineering or a related field.
  • Minimum 7 years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role.
  • Strong verbal and written communication skills in English and Mandarin.
  • Strong experience designing and operating distributed systems at scale
  • Proven ability to improve reliability across multiple services or platforms
  • Deep understanding of system failure modes, scalability, and performance trade-offs
  • Experience defining and implementing SLOs, SLIs, and observability practices
  • Ability to lead incident response and drive systemic improvements
  • Strong communication skills with the ability to influence without authority

More Info

Job Type:
Industry:
Employment Type:

Job ID: 145449055

Similar Jobs