Data Engineer

8-11 Years

SGD 9,000 - 12,000 per month

Save

Early Applicant

Job Description

Key Responsibilities:

Design and develop scalable data pipelines across Hadoop (Hive, Impala, Spark, Kafka, Iceberg) and Teradata environments.
Build ingestion and transformation frameworks using Java, Spark, Python and shell scripts.
Develop full stack applications and internal tools using Python, Shell scripting, and modern web frameworks (e.g., Flask, React).
Create APIs and microservices to expose data and ML models securely to downstream systems and user interfaces.
Collaborate with data scientists to operationalize ML models using Cloudera Machine Learning (CML)
Build and deploy GenAI/LLM-powered applications for intelligent data interaction, summarization, and automation.
Implement enterprise-grade security controls including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access.
Tune and optimize data applications for performance across Hadoop and Teradata, ensuring efficient resource utilization.
Support sandbox environments for prototyping, enabling users to build ML models, dashboards, and data pipelines.

Required Skills & Experience:

Data Engineering: Strong experience with Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas), Teradata and data pipeline orchestration.
Full Stack Development: Proficiency in Python, Shell scripting, REST APIs, and web frameworks (Flask, React, etc.).
Machine Learning & AI: Hands-on experience with ML platforms (CML), Spark MLlib, Python ML libraries (scikit-learn, XGBoost), and model deployment.
GenAI/LLM Applications: Familiarity with building applications using large language models (e.g., OpenAI, Hugging Face, LangChain) for enterprise use cases.
Security & Governance: Experience with enterprise data security (LDAP, Kerberos, RBAC), data masking, and access control.
Performance Tuning: Proven ability to optimize data applications and queries in large-scale environments (Hadoop, Teradata).
Tools & Platforms: Cloudera Data Platform (CDP), Informatica, QlikSense, Apache Oozie, Git, CI/CD pipelines.
Soft Skills: Strong analytical and problem-solving skills, excellent communication, and ability to work in cross-functional teams.