Search by job, company or skills

A

SRE Manager

10-12 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Summary

We're looking for a Site Reliability Engineering (SRE) Manager with strong architectural experience to join the JMET SRE Team. You'll play a key role in leading SRE teams, designing and scaling reliable, secure, and high-performance infrastructure across our cloud and hybrid environments. You'll be responsible for establishing reliability patterns, driving large-scale systems design, and building automation frameworks to support production systems at scale.

Description

This is a hands-on leadership role with architectural ownership, strategic influence, and deep technical impact across multiple domains — including application and infrastructure security, incident response engineering, and resilience automation.

Responsibilities

  • Architect Scalable Infrastructure: Design, evolve, and review highly reliable, performant, and cost-efficient cloud-native and hybrid infrastructure using Infrastructure as Code (IaC), containers, and microservices principles.
  • Support Cryptographic Systems at Scale: Design and operationalize scalable, secure integrations with Hardware Security Modules (HSMs) for sensitive workloads, key management, and cryptographic operations.
  • Drive SRE Best Practices: Define and implement service-level indicators (SLIs), objectives (SLOs), and agreements (SLAs) to guide engineering teams toward reliability and observability goals.
  • Incident Architecture and Prevention: Serve as a technical lead during major incidents. Partner with security and platform teams to conduct thorough post-incident reviews, drive systemic improvements, and establish preventive architectural controls.
  • System Design and Tooling: Build and maintain reusable tooling, automation frameworks, and reliability platforms — including observability, alerting, chaos testing, auto-scaling, and failover.
  • Reliability as Code: Champion resilience engineering through automation pipelines, CI/CD integrations, canary releases, and chaos engineering principles.
  • Multi-Cloud and Hybrid Systems: Design, assess, and guide architecture decisions across AWS, GCP, AliCloud, and on-premises infrastructure. Ensure consistency, interoperability, and regulatory compliance.
  • Security and Compliance: Ensure architectural patterns align with security standards, compliance requirements, and audit readiness.

Minimum Qualifications

  • 10 or more years of experience in SRE, DevOps, or Infrastructure Engineering roles, with 2 or more years in a managerial capacity.
  • Deep expertise in cloud infrastructure (AWS, GCP, or AliCloud) and container orchestration (Kubernetes, EKS).
  • Proven experience with Infrastructure as Code tools such as Terraform and CloudFormation.
  • Strong understanding of distributed systems, networking, and systems design at scale.
  • Proficiency in at least one programming or scripting language, such as Python, Go, or Bash.

Preferred Qualifications

  • Solid background in CI/CD tools and modern deployment strategies, for example Spinnaker and GitOps.
  • Familiarity with security best practices in cloud and containerized environments.
  • Experience with HSMs and cryptographic operations at scale is a plus.

Apple is an equal opportunity employer that is committed to inclusion and diversity. Apple provides reasonable accommodations to applicants with disabilities and in accordance with local requirements. Apple is a drug-free workplace.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146641821