Responsibilities
- Implement and maintain resilient, highly available data engineering, monitoring, and analytics application clusters to support production environments
- Set up and operate server infrastructure and software (Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, Nginx) following bank and industry security standards
- Perform continuous platform improvements including capacity planning, observability, monitoring, reliability, and resiliency enhancements
- Design and develop data engineering pipelines to support data processing and analytics needs
- Automate repetitive tasks and optimize processes to improve efficiency and quality, performing thorough testing to validate changes
- Create and maintain comprehensive software documentation for platform components and processes
- Perform system maintenance, including patching and upgrades, to ensure platform stability and security
Required competencies and certifications
- Minimum 4 years of IT work experience
- Hands-on experience in system administration or system software support
- Experience operating and supporting Linux, Elasticsearch, Logstash, Grafana, Kibana, Kafka, and Nginx software
- Proficient in Unix/Linux/Shell/Python scripting for automation and system management
- Self-driven, committed, and reliable team player with a passion for learning new technologies
Preferred competencies and qualifications
- Knowledge of Site Reliability Engineering (SRE) practices including monitoring, observability, performance management, automation, and resiliency
- Experience with Object Oriented Programming, web application development, NodeJS, Spring Boot, and Kafka
- Familiarity with automation tools such as Ansible, Chef, Puppet, and DevOps pipelines
- Experience in data ingestion processes including extraction, cleansing, parsing, and data analytics