Data Engineer

3-5 Years

SGD 6,500 - 10,000 per month

Save

Early Applicant

Job Description

For asuccessful POC, the candidate should ideally be a Mid-to-Senior level DataEngineer (3-5+ years) with the following must-haves:

TechnicalCore

Databricks Mastery: Expert-level knowledge of Delta Lake and the Medallion Architecture (Bronze/Silver/Gold layers).
Apache Spark (PySpark/SQL): Ability to write optimized Spark code. For coming POC, Python is usually preferred over Scala/R for its flexibility and ecosystem.
AWS Infrastructure: Deep understanding of S3 (Bucket policies/storage), IAM (Roles/Policies) for secure Databricks access, and VPC/Networking (Good to have)
Data Ingestion: Experience with Databricks Autoloader or Unity Catalog for managed data governance.

POC-Specific'Skills

Prototyping Speed: The ability to set up a working end-to-end pipeline (Source → S3 → Databricks → OOTB BI Tool) in weeks, not months.
Cost Management: Knowledge of how to configure Databricks Clusters (Autoscaling, Spot Instances) to prevent the POC from blowing your AWS budget.

JobDescriptions

Focus: Hands-on ETL/ELT and connectingvarious data sources and setup the platform with technical leadership.

Role Summary:
We are seeking a hands-on Data Engineer to spearhead our Databricks POC on AWS. You will be responsible for the initial environment setup, security configuration, and designing the framework for our future data platform.
You will connect diverse AWS and external data sources into a unified Databricks environment.
Key Responsibilities:
Configure Databricks workspace integration with AWS (S3, IAM, VPC).
Cleanse and transform raw data from S3, RDS, and APIs into Delta tables.
Design and implement a scalable Medallion Architecture using Delta Lake.
Build automated ingestion pipelines using Databricks Autoloader.
Optimize Spark jobs for performance and reliability.
Establish data governance standards using Unity Catalog. (Good to have)
Evaluate POC success metrics (performance, cost, ease of use).

Requirements: 3-5+ years in Data Engineering with PySpark/SQL strong experience with AWS Glue or EMR is a plus. Databricks Certified Data Engineer Professional preferred.