Job Scope
The DevOps SRE, with a strong focus on automation, will be instrumental in driving the reliability, efficiency, and scalability of critical infrastructure and applications. This role champions a culture of automation, continuous improvement, and infrastructure-as-code to build robust SRE tools and enhance CI/CD pipelines. While prioritizing DevOps and SRE principles, the engineer will also possess foundational knowledge on messaging and middleware technologies like Kafka and MQ to ensure seamless integration and support across the ecosystem.
Key Responsibilities
- Design, develop, and implement advanced SRE tooling and automation solutions using Python/Java to improve system reliability, operational efficiency, and developer productivity across the infrastructure lifecycle.
- Drive the adoption and enhancement of CI/CD pipelines, leveraging expertise in DevOps practices and tools to enable rapid, secure, and reliable software delivery and infrastructure provisioning.
- Proactively monitor, troubleshoot, and optimize system performance and reliability, identifying and resolving complex incidents, and implementing preventative measures through automation and root cause analysis.
- Collaborate with development and operations teams to embed SRE principles, establish Service Level Objectives (SLOs), and ensure robust system architecture, scalability, and disaster recovery readiness.
- Provide secondary support and expertise for messaging and middleware platforms, including Kafka and MQ, assisting with administration, configuration, performance tuning, and incident resolution as needed.
- Conduct architectural reviews, identify infrastructure gaps, remediate network vulnerabilities, and guide application teams on best practices for operational excellence, security, and compliance.
Requirements
- Minimum of 3 to 6 years of progressive experience in DevOps, SRE, or technical infrastructure roles, preferably within a banking environment supporting mission-critical systems.
- Expert-level proficiency in Python/Java for automation, scripting, and SRE tool development, coupled with hands-on experience in Linux administration.
- Extensive experience with CI/CD practices, DevOps toolchains (e.g., Jenkins, GitLab CI), version control (GIT), project management (Jira), and agile methodologies.
- Strong understanding of SRE principles, including SLOs, error budgets, monitoring, alerting, and incident management, with a proven track record of improving system reliability.
- Foundational to intermediate knowledge and hands-on experience with messaging and middleware technologies, specifically Kafka and MQ, including their administration, configuration, and troubleshooting.
- Exceptional written and verbal communication skills, with a good understanding of ITIL processes and demonstrated leadership abilities in collaborating across teams to drive operational excellence.
Key Skills
- DevOps SRE (Primary)
- MQ & Kafka (Secondary)