Data Engineer
Location: Singapore
Team: Data & Analytics
Reports to: Head of Data / Data Engineering Lead
Role Overview
We are looking for a Data Engineer with 2+ years of relevant experience to design, build, and operate scalable, reliable, production-grade data platforms that support analytics, machine learning, and business decision-making.
This role covers both real-time data streaming and batch processing, with a strong focus on
engineering quality, system reliability, and data freshness, primarily on Google Cloud Platform (GCP).
Key Responsibilities
1. Data Pipelines & Platform Engineering (Batch & Streaming)
- Design, build, and maintain batch and real-time data pipelines
- Work with Pub/Sub, Cloud Run, and BigQuery
- Develop data processing logic using Python (pandas, PySpark) and SQL
- Build real-time ingestion services supporting:Low-latency ingestionIdempotency and de-duplicationData validation and schema evolution
- Implement layered data architectures:Raw Curated Analytics-ready datasets
- Handle late-arriving data, replays, and historical backfills
2. Real-Time Data Streaming & Processing
- Participate in designing event-driven architectures
- Implement streaming logic for:Real-time / near-real-time aggregationsOperational and monitoring datasets
- Understand and apply exactly-once or effectively-once processing semantics
- Monitor streaming pipelines for latency, throughput, and failures
3. Data Modeling & Data Warehousing
- Design and maintain analytics-optimized BigQuery data models
- Apply appropriate:PartitioningClustering
- Support high-ingestion-rate tables and high-performance analytical queries
- Ensure schema consistency across development and production environments
4. Analytics & Machine Learning Enablement
- Build high-quality datasets for:Reporting and dashboardsTime-series analysisMachine learning feature generation
- Collaborate with analysts and data scientists to:Understand data requirementsValidate data accuracy and freshness
5. Cloud Infrastructure & Engineering Practices
- Containerize data services using Docker
- Build and deploy via Cloud Build and Artifact Registry
- Operate Cloud Run services and scheduled jobs
- Assist with:Service accounts and IAM rolesSecrets and environment configuration
- Contribute to CI/CD automation and deployment workflows
6. Data Quality, Governance & Reliability
- Implement data quality checks for both streaming and batch pipelines
- Help identify and resolve:Data delaysMissing or duplicate dataSchema breaking changes
- Maintain documentation, including:Data dictionariesStreaming architecture diagramsOperational runbooks
- Ensure pipelines are auditable, reproducible, and reliable
Required Qualifications
Minimum Requirements
- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related technical field
- 2+ years of experience in data engineering, backend engineering, or data platform roles
- Strong Python skills (pandas and/or PySpark)
- Solid SQL skills (BigQuery experience preferred)
- Hands-on experience building or maintaining production data pipelines
- Understanding of batch vs real-time streaming data processing concepts
Technical Competencies
- Familiarity with event-driven architectures
- Understanding of data modeling and data warehouse design
- Experience handling schema evolution and historical backfills
- Basic performance, scalability, and cost-optimization awareness
Engineering & DevOps Skills
- Experience with Docker and containerized applications
- Familiarity with Git-based development workflows
- Exposure to CI/CD pipelines
- Ability to troubleshoot and debug production issues
Nice to Have
- Experience with real-time streaming systems (Pub/Sub, Kafka, Dataflow)
- Exposure to time-series or near-real-time analytics
- Familiarity with:Dataflow / Apache BeamVertex AIBI tools such as Tableau or Looker
- Experience working with multi-region or multi-currency datasets
What Success Looks Like
- Data pipelines run reliably and with low latency
- Streaming and batch datasets are consistent and trustworthy
- Data freshness SLAs are met
- Downstream analytics and ML teams confidently rely on the data platform
Why Join Us
- Work on modern real-time data platforms
- Clear growth path toward Senior Data Engineer
- Strong engineering ownership and technical depth
- Cloud-native environment focused on long-term maintainability
(Data Engineer)
:
:
: /
2 , , (Real-time Streaming)(Batch Processing), Google Cloud Platform(GCP) ,
1. ( + )
-
- Pub/SubCloud RunBigQuery
- Python(pandasPySpark) SQL
- ,: Schema
- :(Raw) (Curated) (Analytics-ready)
-
2.
-
- ,:/
- Exactly-once Effectively-once
- .
3.
- BigQuery
- (Partitioning)(Clustering)
- Schema
4.
5.
- Docker
- Cloud Build / Artifact Registry
- Cloud Run
- :Service Account IAM
- CI/CD
6.
()
- 2 / /
- Python(pandas / PySpark )
- SQL( BigQuery )
-
-
- .
- Schema
(Nice to Have)
- (Pub/Sub / Kafka / Dataflow)
- :Dataflow / Apache BeamVertex AIBI (Tableau / Looker)