SLOs & error budgets - Define, track, and evangelize latency and availability targets for our payment APIs.
Observability - Deploy Cloud Monitoring, Cloud Trace, Error Reporting, and dashboards integrate alerts via Incident.io and Slack for on-call.
Incident lifecycle - Establish blameless postmortems, guardrails, and runbooks to drive learning and prevent recurrence.
CI/CD golden path - Codify Cloud Build pipelines and automated canary rollouts for Cloud Functions / Cloud Run.
Infrastructure as Code - Manage GCP resources embed security, IAM least-privilege, and cost controls by default.
Performance & cost tuning - Profile hot paths (BigQuery, Firestore, Pub/Sub), and implement caching or concurrency improvements to keep user latency 100 ms.
Developer tooling - Eliminate toil by improving local-to-prod parity, secrets management, and spinning up environments with a single command.
Culture carrier - Instill reliability thinking across engineering and product as the first platform-focused hire.