Job Description :
We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.
As a DevOps / Site Reliability Engineer at JPMorganChase within the Corporate & Investment Bank (CIB), Electronic Trading Technology, you are an integral part of an agile team that builds and operates resilient, secure, and scalable platforms supporting Equity Trading Data Analytics. You will partner closely with engineering and front office stakeholders to deliver reliable AWS EKS-based services and on prem components, driving upgrades, automation, compliance, observability, and incident excellence across a growing set of data and trading analytics workloads.
Job responsibilities
- Own platform reliability for Equity Trading Data Analytics services across cloud-native (AWS) and on prem environments, ensuring secure, stable, and scalable operations.
- Lead service lifecycle operations including upgrades, patching, capacity planning, and production readiness for critical platforms (e.g., Kubernetes/EKS and core OS).
- Drive automation and operational excellence by reducing manual toil, standardizing repeatable processes, and improving CI/CD and release hygiene.
- Build and mature observability and resilience (monitoring, alerting, SLOs, DR/failover testing) to proactively prevent incidents and ensure timely recovery.
- Partner with engineering and business stakeholders to translate onboarding and delivery needs into actionable run plans and reliable execution across regions and teams.
- Lead incident management and problem solving, driving root-cause analysis and durable remediation to prevent recurrence and improve platform health.
- Ensure compliance and risk controls for platform operations (e.g., version currency, certificate/secrets hygiene, audit-ready evidence).
- Maintain high-quality documentation (runbooks, SOPs, onboarding guides) and contribute to a culture of resilience, continuous improvement, and operational ownership.
Required qualifications, capabilities, and skills
- Bachelor's degree in Computer Science, Engineering, Mathematics, or related discipline.
- 5+ years of experience in DevOps, Site Reliability Engineering, platform engineering, or adjacent infrastructure/software roles.
- Hands-on experience operating Kubernetes (EKS preferred) in production, including upgrade planning, rollout strategy, and reliability validation.
- Strong Linux administration experience, including enterprise OS upgrades across environments.
- Proficiency in scripting and automation (e.g., Python, Bash), and building operational tooling to eliminate toil.
- Strong understanding of SDLC/Agile with emphasis on CI/CD, resiliency, and security.
- Experience with observability (monitoring/alerting), incident response, and root-cause analysis in high-availability environments.
- Experience with cloud storage/network and access controls relevant to AWS platform operations (e.g., IAM roles, NLB, node groups, EFS/FSx).
- Ability to communicate clearly with engineering and business partners, translating operational risk and platform constraints into execution plans.