Job Description
We're seeking a hands-on DevOps Engineer to design and operate a hybrid cloud observability stack across on-prem and AWS. You'll own telemetry-logs, metrics, traces, alerts, and dashboards-to enable teams to detect, troubleshoot, and prevent issues at scale.
Key Responsibilities
- Design and maintain a unified observability platform across AWS and on-prem environments.
- Build and manage logging pipelines (ELK/OpenSearch, Logstash, Fluent Bit, Filebeat).
- Develop dashboards and alerts in Grafana, Kibana, and CloudWatch.
- Deploy and scale Prometheus, Alertmanager, and Splunk for metrics, tracing, and analytics.
- Automate infrastructure with Terraform, Helm, and CI/CD pipelines.
- Enforce RBAC, data retention, and cost optimization for observability.
- Drive SRE practices-SLIs, SLOs, error budgets, and post-incident reviews.
Required Skills
- 4-8+ years in DevOps/SRE/Platform Engineering.
- Strong in Prometheus, Grafana, ELK/OpenSearch, Splunk, AWS CloudWatch.
- Expertise in Terraform, Kubernetes, Python/Go scripting, and Git-based CI/CD.
- Solid understanding of Linux, networking, containers, and distributed systems.
Preferred
- Experience with SRE and observability-as-code concepts.
- AWS, Kubernetes, or Terraform certification is a plus.