Site Reliability Engineer

Radley James

Singapore

Fresher

Save

Posted 19 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

A technology-driven trading firm operating in global financial markets, built on robust, low-latency systems, disciplined engineering, and a culture that values ownership, precision, and continuous improvement. Technology is at the core of everything they do.

The Role

As a Site Reliability Engineer, you will be responsible for the reliability, performance, and scalability of mission-critical trading systems. You will work closely with software engineers, traders, and infrastructure teams to ensure the platforms operate with high availability and predictable performance in a fast-paced, real-time environment.

This role is hands-on and suited to engineers who enjoy deep technical challenges, production ownership, and building resilient systems at scale.

Responsibilities

Design, build, and operate highly reliable and scalable production systems
Monitor, troubleshoot, and resolve production issues across trading and market data platforms
Improve system observability through metrics, logging, and alerting
Automate operational workflows and reduce manual toil
Partner with engineering teams to influence system design for reliability and operability
Participate in on-call rotations and incident response, including post-incident reviews
Continuously improve deployment, capacity planning, and disaster recovery practices

Requirements

Strong experience in Site Reliability Engineering, Systems Engineering, or Production Engineering
Solid understanding of Linux systems, networking, and distributed systems
Proficiency in at least one programming or scripting language (e.g. Python, Go, Java, C++, Bash)
Experience with monitoring and observability tools (e.g. Prometheus, Grafana, ELK, Datadog)
Familiarity with containerisation and orchestration technologies (e.g. Docker, Kubernetes)
Experience operating systems in high-availability, low-latency, or high-throughput environments
Ability to debug complex issues under pressure and communicate clearly during incidents

Nice to Have

Experience in trading, financial services, or other latency-sensitive environments
Knowledge of cloud platforms (AWS, GCP, Azure) and hybrid infrastructure
Experience with infrastructure-as-code tools (e.g. Terraform, Ansible)
Understanding of TCP/IP performance tuning and kernel-level optimisations