Our client operates a high-scale consumer platform reaching millions of users globally. The company is investing heavily in its data foundation to power programmatic advertising, personalized recommendations, and marketplace analytics. The data team is expanding to support billions of daily events and petabyte-scale datasets across advertising and ecommerce.
Open Roles & Teams
The client is hiring across several domains and levels:
- Data Platform - Core lakehouse, real-time streaming, governance, cost optimization
- Programmatic Advertising Data - DSP/SSP logs, conversion modeling, attribution, incrementality, signal quality
- Ecommerce & Marketplace Data - Product catalog, pricing, inventory, seller/buyer analytics, search & recommendations
- ML Platform Data - Feature store, online/offline feature pipelines, training data, experimentation
Key Responsibilities by Level
Data Engineer - Mid to Senior Level
- Design, build, and operate batch + streaming ETL/ELT pipelines ingesting 100TB+ daily from ad servers, mobile SDKs, transactional systems, and 3rd party APIs
- Develop real-time jobs using Spark Streaming, Flink, or Kafka Streams for fraud detection, bid optimization, dynamic pricing, and personalization
- Model and optimize datasets on the data lakehouse to serve BI, analytics, and ML use cases
- Implement data quality, anomaly detection, observability, and lineage for Tier-0 datasets powering revenue and user experience
- Partner with Data Science, Product, and Engineering to deliver end-to-end data products
Data Architect / Staff+ Data Engineer
- Define the technical vision and multi-year roadmap for a unified batch + streaming platform supporting 10M+ QPS
- Architect core domain models: identity resolution, multi-touch attribution, product taxonomy, and seller/buyer graphs
- Establish standards for data governance, privacy, and compliance including GDPR, CCPA, and consent frameworks for advertising data
- Lead evaluation and adoption of key technologies: Apache Iceberg/Hudi/Delta Lake, Trino, Ray, DBT, Airflow
- Own reliability, performance, and cost SLOs for datasets critical to real-time bidding and executive reporting
- Provide technical leadership and mentorship influence architecture across the organization
Required Qualifications
- Proven experience building large-scale data systems for programmatic advertising, digital media, marketplaces, or consumer internet platforms
- Expert-level SQL and strong programming in Python, Scala, or Java
- Hands-on production experience with distributed processing: Spark, Kafka, Flink, or equivalent
- Deep expertise with cloud data platforms: BigQuery, Snowflake, Databricks, Redshift, AWS/GCP/Azure
- Strong data modeling skills: dimensional, Data Vault, or lakehouse architectures for analytics and ML
- Track record handling high-volume, semi-structured, late-arriving event data at TB-PB scale
Preferred Domain Experience
- Programmatic Advertising: DSP/SSP infrastructure, impression/click/conversion pipelines, SKAN, MMM/MTA, conversion APIs, identity resolution, signal loss mitigation
- Ecommerce/Marketplace: Product catalog & taxonomy, pricing experimentation, inventory forecasting, seller performance analytics, search & recommendation data
- ML Data: Feature stores like Feast/Tecton, online-offline feature parity, point-in-time correctness, training data infrastructure
For Architect / Principal Levels
- 8+ years designing distributed data systems with 50+ downstream engineers/analysts as customers
- Experience leading 01 architecture for critical domains such as real-time advertising, attribution, or marketplace analytics
- Demonstrated impact on data infrastructure cost optimization at $1M+ annual cloud spend
- Knowledge of privacy-enhancing technologies: differential privacy, data clean rooms, secure multi-party compute
Technology Environment
Cloud & Infra: AWS/GCP, Kubernetes, Terraform
Storage & Query: S3/GCS, Apache Iceberg/Delta Lake/Hudi, BigQuery, Snowflake, Trino
Processing & Orchestration: Spark, Flink, Kafka, Airflow, DBT
Real-time Analytics: Druid, Pinot, ClickHouse
ML Infra: Feature Stores, Ray, Kubeflow
Data Observability: DataDog, Monte Carlo, OpenLineage