Search by job, company or skills
Responsibilities
Own and evolve Datadog-based observability platform-collection, pipelines, analytics, alerting, dashboards, and SLOs-to deliver real-time visibility and faster incident response.
As a secondary capability, apply asset-discovery knowledge to publish high-quality discovery feeds and support the CMDB team with accurate, timely inventory data. This is a Tier-0 role (admin by FTE only).
1) Datadog platform engineering (primary)
. Operate Datadog orgs/projects, RBAC, log pipelines/indexes/archives, metrics, traces/APM, Synthetics, RUM, and DBM at enterprise scale.
. Drive tagging standards and ownership metadata to enable service-aligned dashboards and alert routing.
. Optimize cost/performance (sampling, routing, tiering/archives, retention, metric cardinality).
2) Monitoring-as-Code (MaC) & CI/CD (primary)
. Define monitors, dashboards, SLOs, synthetics, notebooks, service catalog entries, and RBAC as code using Terraform/OpenTofu (Datadog provider) and datadog-ci.
. Build gated pipelines: linting, query/unit tests, cost/volume guardrails, PII/residency checks, drift detection, and promotion (dev staging prod) with automated rollback.
. Maintain change evidence (who/what/when), versioning, and approvals rotate tokens/secrets via vault.
3) Telemetry ingestion & data quality (primary)
. Engineer unified ingest via Datadog Agent, APIs, and gateways integrate OpenTelemetry where appropriate.
. Enforce schema contracts and mandatory tags (e.g., service, env, tier, owner, cost_center) implement validation, deduplication, lineage, and freshness checks.
4) Asset discovery support for CMDB (secondary)
. Apply discovery expertise across datacenter/VM, containers/K8s, multi-cloud (AWS/Azure/GCP), network devices, endpoints, and key SaaS.
. Publish curated discovery feeds (coverage, freshness, deltas) and support reconciliation/exception workflows.
Requirements
. 6-10+ years in Observability/SRE/Platform Engineering deep, hands-on expertise with Datadog (logs, metrics, traces/APM, Synthetics, RUM, DBM).
. Proven Monitoring-as-Code experience with Terraform/OpenTofu (Datadog provider) and datadog-ci strong Git/GitOps, CI/CD (e.g., GitHub Actions/Azure DevOps).
. Automation proficiency (Python/PowerShell) YAML/JSON schema design API integration.
. Experience with tagging schemes, schema/version management, lineage, and cost governance.
. Exposure to asset discovery patterns and how discovery feeds support CMDB reconciliation.
. Comfortable operating Tier-0 platforms with audit rigor.
Preferred qualifications
. Mixed-estate exposure (on-prem/VMware, K8s, AWS/Azure/GCP, network, endpoints, SaaS).
. Building self-service onboarding patterns (API/CLI/portal) with policy gates.
. SLO/burn-rate alerting and service catalog adoption at scale.
Licence no: 12C6060
Date Posted: 18/09/2025
Job ID: 126313975