
Search by job, company or skills
Title: Site Reliability Engineer (SRE)
..About the Role:..
We are looking for a Site Reliability Engineer to help build and maintain reliable, scalable infrastructure for our financial services platform. You will work closely with development and DevOps teams across the US and China to improve system reliability and operational excellence.
..Responsibilities:..
- Design, build, and maintain highly available, scalable infrastructure on AWS
- Define and maintain SLOs/SLIs, error budgets, and monitoring/alerting systems
- Automate operational tasks through tooling and infrastructure-as-code
- Participate in on-call rotation, incident response, and post-incident reviews
- Collaborate with development and globally distributed DevOps teams to improve service reliability and performance
- Perform capacity planning and performance optimization
- Contribute to CI/CD pipeline improvements and deployment safety
..Requirements:..
- BS in Computer Science, Engineering, or equivalent practical experience
- 1-3+ years in SRE, DevOps, or systems engineering roles
- Experience with AWS cloud platform
- Proficiency in at least one scripting/programming language (Shell, Python)
- Strong knowledge of Linux systems and networking (TCP/IP, DNS, HTTP, Load Balancing)
- Experience with infrastructure-as-code tools (Terraform, Ansible, or Pulumi)
- Experience with observability tools (Prometheus, Grafana, Datadog, or ELK)
- Experience with CI/CD pipelines and deployment automation
Job ID: 146501701