Search by job, company or skills

P

Site Reliability Engineer

3-5 Years
SGD 5,000 - 6,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are seeking a Site Reliability Engineer to work closely with IT teams in providing both operational and project support. The ideal candidate will play a key role in ensuring system reliability, improving observability, automating operational tasks, and supporting critical production environments.
This role requires strong technical expertise, effective communication skills, and the ability to collaborate well with both internal teams and external partners.
Key Responsibilities:

1. Observability and Proactive Monitoring

  • Monitor log files, system health, and application performance to ensure service reliability and availability.
  • Install, configure, and manage monitoring tools.
  • Implement, enhance, and integrate monitoring solutions to enable proactive monitoring and improve business and operational processes.
  • Analyze monitoring data and generate dashboards and reports to provide operational insights and support decision-making.

2. Automation of Day-to-Day Operational Activities (20%)

  • Automate routine operational tasks using tools and scripting languages such as:
  1. Ansible
  2. Jenkins
  3. Shell scripting
  4. PowerShell
  5. Python

3. Reliability Engineering Support (20%)

  • Support service requests related to the reliability engineering function.
  • Provide operational support for reliability-related activities.
  • Participate in after-office-hours support when required, including:
  • Immediate response and resolution of production incidents
  • Support for project cutovers and implementation activities

Requirements:

  • Minimum 3 years of experience in IT operations, automation, and monitoring solutions.
  • At least 2 years of scripting experience, preferably with:
  1. Ansible
  2. Shell scripting
  3. Python
  4. Familiarity with platforms and technologies such as:
  5. Windows
  6. Linux
  7. Unix
  8. Cloud platforms (preferably AWS)
  9. Databases
  10. Middleware
  • Certified in Site Reliability Engineering (SRE) or an equivalent certification.
  • Strong communication and interpersonal skills, with the ability to work effectively with both internal and external stakeholders.

Preferred Skills:

  • Experience in proactive incident monitoring and operational support.
  • Strong analytical skills with the ability to interpret system data and produce meaningful insights.
  • Hands-on experience in automation and process improvement.
  • Ability to work in a fast-paced environment and provide support for critical systems.

To apply,simply click the Apply button or send your updated profile to [Confidential Information]

EA Licence No.:18S9405 / EA Reg. No.:R1330864

PerceptSolutions is expanding and actively seeking talented individuals. We encourageapplicants to follow Percept Solutions on LinkedIn at https://www.linkedin.com/company/percept-solutions/to stay informed about new opportunities and events.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 144961881

Similar Jobs