Our dashboards are slow. Pipelines break. Analysts can't trust the data. Data scientists wait days for features.
You fix that. As a Senior Data Engineer, you build the pipes, warehouses, and tools that let 2000+ people make data-driven decisions at petabyte scale. If data is late, wrong, or expensive - you feel it. If queries run 10x faster and costs drop 50% - that's also you.
What You'll Do
1. Build & Scale Data Pipelines
- Design, build, and operate batch + real-time pipelines ingesting 1M+ events/sec from app, logs, and 3rd party sources
- Own data modeling: dimensional, Data Vault, or lakehouse on Iceberg/Hudi/Delta. You decide, you defend
- Transform messy raw data into clean, tested, documented marts that 1000+ users trust
- Orchestrate 1000+ DAGs with Airflow: SLAs, retries, backfills, dependency management
2. Platform & Infrastructure
- Optimize Spark/Flink jobs that process TBs daily. You know when to cache, partition, and when to rewrite in SQL
- Improve query performance: BigQuery, Snowflake, Trino, or ClickHouse. Sub-second SLAs on billion-row tables
- Build self-serve tooling: data ingestion frameworks, CI/CD for data, testing harnesses
- Manage data infra: Kafka, Spark, Airflow, dbt. Upgrades, scaling, cost optimization
3. Data Quality & Reliability
- Implement data contracts, schema evolution, and breaking change detection with producers
- Build observability: freshness, volume, schema, and distribution checks. Great Expectations, Monte Carlo, or custom
- Own lineage and cataloging. If someone asks where did this number come from, you can answer in 30 sec
- Oncall for critical pipelines. P0 means P0. You've debugged 3am Airflow failures before
4. Enable the Business
- Partner with Analytics Eng, DS, and Product to understand data needs and ship solutions fast
- Translate vague requests like I need user data into versioned, tested, documented datasets
- Mentor mid-level DEs. Review designs, PRs, and raise the bar for data engineering
- Kill tech debt. Deprecate unused tables. Archive cold data. FinOps is part of the job
What You'll Bring
Must-haves:
- 8+ YOE as a Data Engineer with production experience at TB-PB scale. We will consider DE with lesser years of experience for junior positions.
- Expert SQL: Window functions, CTEs, query plans. You can make a 30min query run in 30s
- Python + Spark: PySpark, DataFrames, UDFs, performance tuning. You know why your job OOMed
- Data modeling: Star schema, slowly changing dimensions, idempotency. You've been burned by bad models before
- Warehouse/Lakehouse: Deep experience with BigQuery, Snowflake, Redshift, or Iceberg/Hudi/Delta
- Orchestration: Airflow, Dagster, or Prefect. You've built complex DAGs and suffered through timezone bugs
- Software engineering: Git, CI/CD, testing, code reviews. You don't ship untested SQL
- Systems thinking: You consider cost, latency, freshness, and downstream impact in every design