We are looking for a skilled and reliability-driven Site Reliability Engineer (SRE) to strengthen our engineering team. In this hybrid role, you will combine hands-on 2nd level support responsibilities with monitoring, automation, and reliability engineering. You will play a key role in ensuring the stability, observability, and continuous improvement of our production systems supporting real-time financial data processing.
What You Will Do
Operational Support & Incident Management
- Provide 2nd level support for production systems and critical business applications.
- Investigate, troubleshoot, and resolve incidents and performance issues. Perform root cause analysis (RCA) and document findings in a structured manner.
- Collaborate closely with development teams to ensure sustainable issue resolution.
- Contribute to post-incident reviews and continuous improvement initiatives.
Monitoring, Observability & Automation
- Design, implement, and maintain monitoring dashboards.
- Improve alert quality and reduce noise through effective threshold and metric design.
- Analyze logs, metrics, and system behavior to proactively detect anomalies.
- Automate operational processes using Ansible and scripting.
- Contribute to CI/CD and deployment reliability improvements.
- Continuously optimize system reliability, availability, and operational efficiency.
What You Bring
Operational Mindset & Collaboration
- Proven experience in Site Reliability Engineering, DevOps, or 2nd level production support.
- Strong analytical and troubleshooting skills in complex distributed environments.
- Structured, solution-oriented approach with strong ownership mindset.
- Effective communication skills and ability to work with cross-functional teams.
- Motivation to reduce manual effort through automation and process improvements.
Technical Skills
- Hands-on experience with Elastic Stack and Grafana for monitoring and logging.
- Experience with Ansible for configuration management and automation.
- Experience with Git/GitLab.
- Familiarity with scripting languages.
Tooling & Ecosystem (Nice to Have)
- Good understanding of networking fundamentals (TCP/IP, DNS, HTTP).
- Experience with Linux systems and shell scripting.
Language Skills
- Strong verbal and written English.