Search by job, company or skills

F

Senior SRE, Infrastructure & Platform

5-7 Years
SGD 9,000 - 18,000 per month
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are looking for a Senior Site Reliability Engineer that leads with kindness, and possesses a strong software development background to join our Infrastructure Engineering team. Your primary focus will be building automation, tooling, and internal platforms that enable our team to operate a global, multi-datacenter infrastructure spanning a growing number of Points of Presence across the globe.

You will work within a PCI-DSS compliant environment and participate in a 24x7 on-call rotation.

What You'll Do

Internal Tooling & Application Development

  • Design and develop internal tools, CLIs, and APIs (primarily in Go and Python) that enable infrastructure self-service, automate complex workflows, and improve operational efficiency
  • Build integrations between infrastructure systems -- connecting CMDB/IPAM (NetBox), secrets management (HashiCorp Vault), hypervisor APIs (Proxmox), monitoring platforms, and CI/CD pipelines into cohesive automated workflows
  • Develop and maintain API clients and libraries for interacting with infrastructure services (Proxmox API, Vault API, NetBox API, iLO Redfish, container registries)
  • Write well-tested, documented, and maintainable code with proper versioning, release processes, and code review practices

Infrastructure as Code & Ansible Development

  • Architect, develop, and refactor Ansible roles and playbooks across a large-scale inventory spanning 30+ datacenters, 80+ group variable files, and 40+ roles
  • Design reusable, composable Ansible role patterns that scale cleanly as the DC footprint grows -- new DCs should be deployable with minimal variable additions
  • Improve idempotency, error handling, and test coverage across the existing Ansible codebase
  • Develop custom Ansible modules, plugins, and lookup plugins where upstream modules may be insufficient (e.g., custom Vault integration, Proxmox API interactions, iLO automation)
  • Automate bare-metal server lifecycle end-to-end: from iLO bootstrap through OS installation, hypervisor configuration, VM provisioning, and service deployment

CI/CD Pipeline Engineering

  • Design, write, and maintain GitLab CI pipelines for infrastructure automation, including multi-stage deployment workflows with linting, validation, canary testing, and regional rollout
  • Build pipeline patterns for safe infrastructure changes: staged rollouts, automated rollback, drift detection, and change validation
  • Create reusable pipeline templates and shared CI components that standardise how infrastructure changes are tested and deployed
  • Implement automated testing for Ansible roles and infrastructure changes (molecule, ansible-lint, integration testing in ephemeral environments)

Kubernetes & Container Platform Automation

  • Develop automation for self-hosted Kubernetes cluster lifecycle management: provisioning, upgrades, scaling, and disaster recovery
  • Build and maintain container image build pipelines, registry management, and image promotion workflows
  • Create Kubernetes operators or controllers (in Go) where custom automation of cluster-level concerns is needed
  • Automate workload deployment patterns, including Helm chart development and GitOps workflows

Cloud Infrastructure Automation

  • Develop IaC and automation for AWS and Azure resources, integrating cloud infrastructure with on-premises systems
  • Build automation that spans hybrid environments -- coordinating deployments across bare-metal, virtualized, and cloud targets from a unified pipeline

Observability & Reliability Engineering

  • Instrument internal tools and automation with proper logging, metrics, and tracing
  • Build automated remediation workflows that respond to monitoring alerts and reduce mean time to recovery
  • Develop reporting and dashboards that provide visibility into infrastructure state, automation success rates, and toil metrics
  • Identify and automate away recurring operational toil track and quantify toil reduction over time

Security & Compliance Automation

  • Automate PCI-DSS compliance workflows including CIS benchmark hardening, audit evidence collection, and configuration drift detection
  • Build automated secret rotation pipelines using HashiCorp Vault
  • Develop security scanning integration into CI/CD pipelines (container image scanning, infrastructure configuration validation)

What We're Looking For

  • 5+ years of experience in an SRE, DevOps, or Infrastructure Engineering role with a strong emphasis on writing code and building automation
  • Proficiency in Python, with experience building CLI tools, APIs (Flask/FastAPI or equivalent), and automation frameworks
  • Expert-level Ansible skills: custom role development, module/plugin authorship, complex Jinja2 templating, inventory management at scale, and CI/CD integration
  • Solid Linux systems knowledge (RHEL/CentOS) -- you need to understand the systems you're automating at a depth that lets you debug failures and design robust automation
  • Experience building and maintaining CI/CD pipelines (GitLab CI preferred) for infrastructure automation, not just application builds
  • Production experience with self-hosted Kubernetes: cluster operations, controller/operator development, and workload automation
  • Practical AWS and Azure experience with an IaC mindset -- provisioning and managing cloud resources through automation, not console clicks
  • Experience with API-driven infrastructure management (RESTful APIs, Redfish/iLO, hypervisor APIs)
  • Familiarity with HashiCorp Vault or equivalent secrets management platforms, including programmatic integration
  • Understanding of PCI-DSS requirements as they apply to automated infrastructure management -- audit trails, change control, hardening automation
  • Strong software engineering fundamentals: version control workflows, code review, testing practices, documentation, and release management

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com).

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting [Confidential Information].

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146470503

Similar Jobs

Time Series

**********Company Name Confidential
Early Applicant
Early Applicant