Search by job, company or skills

L

Data Engineer (CDC & Legal ETL)

2-4 Years
SGD 4,000 - 4,500 per month
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

As our Data Engineer, you will own the lifeblood of our legal AI platform - the data pipeline that keeps our knowledge graph accurate and current. You will build CDC (Change Data Capture) pipelines to synchronize our Neo4j knowledge graph with Singapore's official legal publications, implement data quality validation frameworks, and ensure version control across statutory amendments. When a new Gazette is published, your pipeline is what makes our AI know about it within days - not months.

Key Responsibilities

  • Design and implement CDC pipelines to capture incremental changes from Singapore Gazette, AGC consolidated statutes, and other authoritative legal sources

  • Build automated data ingestion workflows: scraping, parsing, structural analysis of legal documents (Act → Part → Division → Section → Subsection)

  • Implement temporal metadata extraction: effective dates, repeal dates, amendment lineage, and version tracking for every statutory provision

  • Develop data quality validation framework: automated checks for temporal conflicts, missing citations, entity mismatches, and cross-jurisdiction inconsistencies

  • Manage Neo4j graph data loading and incremental updates - updating specific nodes without full graph rebuilds

  • Build monitoring dashboards for data pipeline health: ingestion latency, error rates, coverage metrics

  • Implement data versioning and rollback capabilities for audit compliance

  • Collaborate with Backend Engineer on ETL-to-KAG integration and with QA on data accuracy validation

Requirements

  • 2+ years experience in data engineering, ETL development, or data pipeline architecture

  • Proficiency in Python and/or Go for data processing scripts and pipeline orchestration

  • Experience with graph databases (Neo4j preferred) or relational databases (PostgreSQL)

  • Hands-on experience with CDC tools or patterns (Debezium, Kafka Connect, or custom CDC)

  • Understanding of data quality frameworks and validation methodologies

  • Familiarity with AWS data services (S3, Glue, Lambda, or Step Functions)

  • Proficiency in English Mandarin is a strong plus

  • Singapore Citizen or Permanent Resident (PR) required

Nice-to-Have

  • Experience with web scraping and document parsing (BeautifulSoup, Scrapy, or similar)

  • Background in legal data, regulatory data, or structured document processing

  • Experience with workflow orchestration tools (Airflow, Dagster, Prefect)

  • Knowledge of NLP-based entity extraction or named entity recognition

More Info

Job Type:
Industry:
Employment Type:

Job ID: 146074857

Similar Jobs