Job Summary
We are seeking a highly skilled DevSecOps Site Reliability Engineer to join our world-class product engineering SRE team. The ideal candidate will be proficient in managing highly available infrastructure using SaaS application development and operations, skilled with AWS cloud architecture, and experienced in embedding security into DevOps automation tools (Shift-Left). You will drive the convergence of reliability and security by delivering high availability, secure configuration management, and robust infrastructure as code.
Responsibilities
- Engage in operational and security incident resolution, build mitigation plans, and leverage AI-driven automation to reduce toil and improve Mean Time to Detect/Resolve (MTTD/MTTR).
- Design, implement, and manage secure Kubernetes clusters on platforms like AWS and Rancher, enforcing zero-trust architectures and strict RBAC.
- Provision, configure, and harden AWS infrastructure resources (VPCs, subnets, security groups, IAM roles) utilizing principle-of-least-privilege.
- Automate infrastructure provisioning and configuration using Terraform/Terragrunt, embedding Policy-as-Code and security checks directly into the provisioning lifecycle.
- Deploy and manage containerized applications, ensuring secure container lifecycles, image signing, and runtime threat defense.
- Implement and maintain CI/CD pipelines, integrating continuous security scanning (SAST, DAST, SCA) and open-source licensing compliance checks.
- Implement and maintain modern, open-standards observability platforms (e.g., OpenTelemetry, Grafana) to provide deep visibility into both operational performance and security events.
- Identify, troubleshoot, and remediate availability, performance, and security vulnerabilities at multiple layers of deployment.
- Improve overall developer experience by ensuring tools are available, efficient, and that security guardrails are seamless and developer-friendly.
- Integrate systems and build configurations to innovate around public cloud-based platforms, assisting with secure migration strategies from legacy environments.
Tools We Use Java, Python, GitHub Actions, JFrog, Nexus, SonarQube, Trivy/Snyk (or similar SCA/container scanning), Splunk, Honeycomb, Grafana, OpenTelemetry, Prometheus, InfluxDB, Redis, Kafka, Oracle, Postgres, Kubernetes, Rancher, Istio, Terraform, HashiCorp Vault / AWS Secrets Manager, AWS services, Rundeck, Temporal, LaunchDarkly, PagerDuty.
Qualifications Expected
- Strong understanding of Kubernetes concepts (pods, services, deployments) and Kubernetes security best practices (Network Policies, Pod Security Admissions, OPA/Gatekeeper).
- Hands-on experience with Rancher and AWS EKS, including secure cluster creation, configuration, and lifecycle management.
- Proficiency in scripting languages like Python or Bash for operational and security automation.
- Deep experience with Infrastructure as Code (Terraform) and applying secure coding practices to infrastructure deployment.
- Extensive knowledge of containerization technologies, secure base images, and vulnerability scanning in container registries.
- Familiarity with CI/CD pipelines (GitHub Actions, GitLab CI/CD) and integrating automated security testing into the build process.
- Strong understanding of open-source software governance, dependency management, and enterprise licensing compliance.
- Excellent communication and collaboration skills to act as a security champion among product engineering teams.
Software Engineering & Security
- Advanced Kubernetes concepts like custom resource definitions (CRDs) and building secure operators.
- Setting up, maintaining, and securing CI/CD pipelines against software supply chain attacks.
- Configuration management using tools like Terraform, Puppet, or Ansible with a focus on configuration drift detection.
- Experience running production microservices and distributed systems architectures at scale with security baked in.
- Background in managing cloud-based workloads with leading public cloud platforms (AWS preferred), focusing on cloud security posture management (CSPM).
Containerization & Orchestration
- Build and operate Docker containers with a focus on minimal attack surfaces (e.g., distroless images), secure construction, and optimization.
- Experience defining and managing applications operating on orchestration platforms.
- Working experience with service mesh configuration (Istio) for mTLS, traffic encryption, and secure service-to-service communication.
Observability & Monitoring
- Experience deploying and tuning observability tools to capture infrastructure telemetry, application metrics, and security audit logs.
- Ability to correlate security anomalies with operational metrics to detect potential breaches or DDoS attempts.
API Gateway Engineering (Nice to have)
- Understand common API concepts, data storage, service status, and secure session handling (OAuth, JWT).
- Familiarity with API management systems for high availability, rate limiting, resilience, and recovery.
- Deploy, configure, tune, and monitor API Gateways (e.g., configuring WAF rules).
- Apply API policies and standards for enterprise security and standardization.
Additional Expectations
- Curiosity and ability to quickly learn new operational and security tools.
- Willingness to mentor others and openly explain your technical reasoning while solving complex issues.
- Excellent communication skills, both verbal and written, with English proficiency for complex technical and security discussions.
- Experience debugging complex, multi-layered distributed system problems.
- Proven record of driving DevSecOps cultural changes and suggesting company-wide improvements to existing tools and processes.
- Experience collaborating with distributed, cross-cultural teams spread across the globe in different time zones.
Certifications: AWS Certified DevOps Engineer - Professional, AWS Certified Security - Specialty, or Certified Kubernetes Security Specialist (CKS) preferred not required.