Responsibilities
- Design, develop, and implement advanced SRE tooling and automation solutions using Python and Java to improve system reliability and operational efficiency across the infrastructure lifecycle
- Drive the adoption and continuous improvement of CI/CD pipelines by applying DevOps best practices and tools to enable rapid, secure, and reliable software delivery and infrastructure provisioning
- Proactively monitor, troubleshoot, and optimize system performance and reliability by identifying and resolving complex incidents and implementing preventative automation and root cause analysis
- Collaborate with development and operations teams to embed SRE principles, establish and maintain Service Level Objectives (SLOs), and ensure scalable, robust system architecture and disaster recovery readiness
- Provide secondary support and technical expertise for messaging and middleware platforms, including Kafka and MQ, by assisting with administration, configuration, performance tuning, and incident resolution
- Conduct architectural reviews to identify infrastructure gaps, remediate network vulnerabilities, and guide application teams on operational excellence, security, and compliance best practices
Required competencies and certifications
- Minimum of 3 to 6 years of progressive experience in DevOps, SRE, or technical infrastructure roles supporting mission-critical systems, preferably in a banking environment
- Expert-level proficiency in Python and Java for automation, scripting, and SRE tool development
- Hands-on experience in Linux administration
- Extensive experience with CI/CD practices and DevOps toolchains such as Jenkins and GitLab CI
- Proficient in version control systems (GIT), project management tools (Jira), and agile methodologies
- Strong understanding and application of SRE principles including SLOs, error budgets, monitoring, alerting, and incident management with demonstrated success in improving system reliability
- Foundational to intermediate hands-on experience with messaging and middleware technologies, specifically Kafka and MQ, including administration, configuration, and troubleshooting
- Exceptional written and verbal communication skills with a solid understanding of ITIL processes
- Demonstrated leadership in cross-team collaboration to drive operational excellence