Data Engineer across all levels

newbridge alliance pte. ltd.

Anson, Singapore

8-10 Years

SGD 8,000 - 17,000 per month

Save

Posted 23 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Our client operates a high-scale consumer platform reaching millions of users globally. The company is investing heavily in its data foundation to power programmatic advertising, personalized recommendations, and marketplace analytics. The data team is expanding to support billions of daily events and petabyte-scale datasets across advertising and ecommerce.

Open Roles & Teams

The client is hiring across several domains and levels:

Data Platform - Core lakehouse, real-time streaming, governance, cost optimization
Programmatic Advertising Data - DSP/SSP logs, conversion modeling, attribution, incrementality, signal quality
Ecommerce & Marketplace Data - Product catalog, pricing, inventory, seller/buyer analytics, search & recommendations
ML Platform Data - Feature store, online/offline feature pipelines, training data, experimentation

Key Responsibilities by Level

Data Engineer - Mid to Senior Level

Design, build, and operate batch + streaming ETL/ELT pipelines ingesting 100TB+ daily from ad servers, mobile SDKs, transactional systems, and 3rd party APIs
Develop real-time jobs using Spark Streaming, Flink, or Kafka Streams for fraud detection, bid optimization, dynamic pricing, and personalization
Model and optimize datasets on the data lakehouse to serve BI, analytics, and ML use cases
Implement data quality, anomaly detection, observability, and lineage for Tier-0 datasets powering revenue and user experience
Partner with Data Science, Product, and Engineering to deliver end-to-end data products

Data Architect / Staff+ Data Engineer

Define the technical vision and multi-year roadmap for a unified batch + streaming platform supporting 10M+ QPS
Architect core domain models: identity resolution, multi-touch attribution, product taxonomy, and seller/buyer graphs
Establish standards for data governance, privacy, and compliance including GDPR, CCPA, and consent frameworks for advertising data
Lead evaluation and adoption of key technologies: Apache Iceberg/Hudi/Delta Lake, Trino, Ray, DBT, Airflow
Own reliability, performance, and cost SLOs for datasets critical to real-time bidding and executive reporting
Provide technical leadership and mentorship influence architecture across the organization

Required Qualifications

Proven experience building large-scale data systems for programmatic advertising, digital media, marketplaces, or consumer internet platforms
Expert-level SQL and strong programming in Python, Scala, or Java
Hands-on production experience with distributed processing: Spark, Kafka, Flink, or equivalent
Deep expertise with cloud data platforms: BigQuery, Snowflake, Databricks, Redshift, AWS/GCP/Azure
Strong data modeling skills: dimensional, Data Vault, or lakehouse architectures for analytics and ML
Track record handling high-volume, semi-structured, late-arriving event data at TB-PB scale

Preferred Domain Experience

Programmatic Advertising: DSP/SSP infrastructure, impression/click/conversion pipelines, SKAN, MMM/MTA, conversion APIs, identity resolution, signal loss mitigation
Ecommerce/Marketplace: Product catalog & taxonomy, pricing experimentation, inventory forecasting, seller performance analytics, search & recommendation data
ML Data: Feature stores like Feast/Tecton, online-offline feature parity, point-in-time correctness, training data infrastructure

For Architect / Principal Levels

8+ years designing distributed data systems with 50+ downstream engineers/analysts as customers
Experience leading 01 architecture for critical domains such as real-time advertising, attribution, or marketplace analytics
Demonstrated impact on data infrastructure cost optimization at $1M+ annual cloud spend
Knowledge of privacy-enhancing technologies: differential privacy, data clean rooms, secure multi-party compute

Technology Environment

Cloud & Infra: AWS/GCP, Kubernetes, Terraform
Storage & Query: S3/GCS, Apache Iceberg/Delta Lake/Hudi, BigQuery, Snowflake, Trino
Processing & Orchestration: Spark, Flink, Kafka, Airflow, DBT
Real-time Analytics: Druid, Pinot, ClickHouse
ML Infra: Feature Stores, Ray, Kubeflow
Data Observability: DataDog, Monte Carlo, OpenLineage