Search by job, company or skills

M

Site Reliability Engineer

3-5 Years
SGD 6,500 - 7,500 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for a proactive and skilled Site Reliability Engineer (SRE) with hands-on experience in Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) to join our engineering team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our critical systems and services, with a focus on managing and optimizing Elastic deployments. This role combines software engineering and systems engineering to build and maintain highly available infrastructure.

Key Responsibilities

  • Design, deploy, and maintain scalable and highly available Elastic Stack environments to support logging, monitoring, and analytics needs.

  • Monitor system health, performance, and capacity of Elastic clusters and related infrastructure, proactively identifying and resolving issues.

  • Automate deployment, configuration, and management of Elastic components using infrastructure-as-code tools and scripting.

  • Collaborate with development, operations, and security teams to ensure reliability, security, and compliance of Elastic-based solutions.

  • Develop and maintain monitoring, alerting, and incident response processes to minimize downtime and improve system resilience.

  • Troubleshoot and resolve complex issues related to Elastic performance, indexing, query optimization, and cluster stability.

  • Participate in on-call rotations to provide 24/7 support for critical systems and respond to incidents promptly.

  • Continuously improve system reliability through capacity planning, performance tuning, and root cause analysis.

  • Document system architecture, operational procedures, and best practices related to Elastic and overall infrastructure.

Required Qualifications

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.

  • 3+ years of experience as a Site Reliability Engineer, DevOps engineer, or similar role.

  • Strong hands-on experience with Elastic Stack components: Elasticsearch, Logstash, Kibana, and Beats.

  • Proficient in Linux system administration and networking fundamentals.

  • Experience with automation and configuration management tools such as Ansible, Terraform, or similar.

  • Solid scripting skills in languages like Python, Bash, or Go.

  • Familiarity with containerization (Docker) and orchestration platforms (Kubernetes) is a plus.

  • Experience with monitoring and alerting tools such as Prometheus, Grafana, or equivalent.

  • Strong problem-solving skills and ability to work under pressure during incidents.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 143487263