K&K Global Talents is an international recruiting agency that has been providing technical resources globally since 1993. This position is with one of our clients in Singapore who is actively hiring candidates to expand their teams.
Title: Site Reliability Engineer
Location: (Changi), Singapore
Employment Type: Full-time Permanent
Mode of Operation: Onsite
Notice Period: 0-1 month
Responsibilities:
- Engage with product, architects, developers, Certification, Project management, Operations & Infrastructure teams from the start of the SDLC phase.
- Become subject matter expert for the assigned product verticals. Analyze complex systems from a reliability and resilience perspective.
- Run the production environment by monitoring availability and taking a holistic view of system health
- Understanding the end-to-end product topology from infrastructure and application perspective.
- Identify sources of instability in large-scale distributed systems and drive operational excellence. Dive deep and understand every issue occurred and own them completely for end-to-end closure.
- Performing functional analysis of products by gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding – integration/operational challenges.
- Performing code bug fixes in production and recommending any architectural improvements during issue/incident analysis.
- Work closely with development and product teams on suggesting new features and enhancements based on live issues.
- Drive down the burden of toil with tooling and automation to achieve operational efficiency and smoother customer experience.
- Technical consultancy for monitoring, incidents and problem management. Lead technical bridges and interact with both technical staff and management during the incident and change management process.
- Engage with tech and non-tech partners on regular basis to analyze functional and technical in-depth solutions.
- Understanding new changes in production systems and assessing its risk from application perspective for driving reliability and availability
- Provide guidance and technical expertise to junior team members.