About the Role
My client is a fast-growing digital payments firm building next-generation financial infrastructure, enabling seamless, real-time transactions across markets. As they continue to scale, they are looking for a Site Reliability Engineer to drive reliability, performance, and scalability across their platform.
What You'll Do
- Maintain and enhance the reliability, availability, and performance of production systems
- Design and implement monitoring, alerting, and incident response processes
- Support and optimise CI/CD pipelines, driving automation across deployment workflows
- Troubleshoot issues across infrastructure, applications, and network layers
- Collaborate with engineering teams to improve system design, scalability, and resilience
- Manage and optimise cloud infrastructure (AWS/GCP) and containerised environments (Docker/Kubernetes)
- Drive root cause analysis (RCA) and implement long-term preventive solutions
What They're Looking For
- 2+ years of experience in SRE, DevOps, or infrastructure engineering
- Strong hands-on experience with cloud platforms (AWS, GCP, or Azure)
- Experience with containerisation and orchestration (Docker, Kubernetes)
- Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI) and scripting (Python, Bash, or similar)
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK)
- Solid understanding of Linux/Unix systems and networking fundamentals
- Strong troubleshooting and problem-solving skills
- Mandarin language capability to collaborate with regional stakeholders
- Experience in fast-paced, high-growth environments is highly preferred