
Search by job, company or skills
Background:
The SG-Microbiome / GLOW programme is a multi-site effort to generate an integrated, multi-modal dataset spanning population cohort data, mental health and behavioural insights, host molecular phenotyping, microbiome phenotyping, and other multi-omics. Work Package 5 (WP5) provides the data integration and platform backbone for the programme by establishing a secure, scalable compute-and-data environment, implementing reproducible processing pipelines, and enabling harmonized, analysis-ready datasets for downstream statistical and machine learning analyses across WP1-WP4.
WP5 will adopt an AWS-native approach in the first instance, dovetailing into PRECISEs existing data infrastructure to benefit from shared capabilities and economies of scale. In parallel, WP5 will evaluate (and, if approved, integrate with) Trusted Research Environments (TREs) such as MOH TRUST and BC Platforms to support governed access, secure collaboration, and potential federation workflows.
By joining our multidisciplinary team, you will:
Contribute to a national-scale precision health programme that combines cohort phenotyping, microbiome, and multi-omics data.
Build and operate real-world, production-grade data pipelines and platforms on AWS.
Work closely with clinicians, bioinformaticians, and engineers to translate scientific requirements into robust infrastructure and data products.
Develop transferable expertise in governed data platforms, scalable analytics, and cross-institution collaboration models.
Job Description and scope of the specific project:
The role focuses on building and operating the WP5 data platform, with emphasis on early pilot readiness and progressive scaling. Key responsibilities include:
Establish AWS-native data environment components (secure storage, compute, identity/access controls, logging) suitable for pilot-scale ingestion and iterative expansion.
Implement and maintain initial processing pipelines for multi-modal data streams, including microbiome sequencing outputs and associated metadata, host molecular omics outputs, and structured phenotyping/behavioural data as they become available from partner work packages.
Develop harmonization and QC frameworks that standardize data representations across modalities (sample metadata, batch/QC metrics, phenotype dictionaries), and generate routine QC summaries for project leadership and work package owners.
Produce interim analysis-ready datasets and curated data releases aligned with WP1WP4 timelines, including documentation (data dictionaries, processing provenance, versioning) and governed export processes where required.
Prepare for full-scale processing and potential federation by designing for portability (workflow management, containerization) and by supporting evaluation/integration with TRE options (e.g., MOH TRUST, BC Platforms), subject to programme approval.
Contribute to technical documentation, reproducibility practices (version control, change logs), and internal knowledge transfer to ensure sustainable operations beyond individual contributors.
Key Qualifications:
Required:
PhD (or equivalent experience) in Bioinformatics, Computational Biology, Data Science, or related quantitative discipline.
Demonstrated experience building and/or operating data pipelines for high-throughput biological data (e.g., sequencing-derived outputs, omics matrices, metadata).
Proficiency in Python and/or R for automation, QC, and data integration tasks.
Familiarity with workflow management (e.g., Nextflow, Snakemake, WDL/CWL) and containerized execution (Docker/Singularity).
Working knowledge of cloud computing concepts AWS experience strongly preferred (IAM/RBAC, S3, EC2, managed analytics services).
Strong communication skills and ability to translate scientific needs into technical deliverables across teams.
Preferred:
Experience with multi-omics integration, microbiome analytical data structures, and/or large-scale cohort phenotyping.
Experience with data governance concepts in regulated environments (auditing, access controls, output checking).
Experience with modern engineering practices: Git/GitHub, CI/CD, infrastructure-as-code, and collaborative documentation (Confluence/Notion).
The Agency for Science, Technology and Research (A*STAR) is a statutory board under the Ministry of Trade and Industry of Singapore.The agency supports R&D that is aligned to areas of competitive advantage and national needs for Singapore. These span the four technology domains of Manufacturing, Trade and Connectivity, Human Health and Potential, Urban Solutions and Sustainability, and Smart Nation and Digital Economy set out under the nation's five-year R&D plan (RIE2025).
Job ID: 143500309