Hadoop Data Engineer

CONSULGURU PTE. LTD.

Singapore, Lavender Street

6-9 Years

SGD 7,000 - 14,000 per month

Save

Posted 27 days ago
Be among the first 20 applicants

Early Applicant

Job Description

We are looking for an experienced and highly skilled Hadoop Data Engineer to join our dynamic team. The ideal candidate will have hands-on expertise in developing optimized data pipelines using Python, PySpark, Scala, Spark-SQL, Hive, and other big data technologies. You will be responsible for translating complex business and technical requirements into efficient data pipelines and ensuring high-quality code delivery through collaboration and code reviews.

Roles & Responsibilities:

Data Transformation & Pipeline Development:

Design and implement optimized data pipelines using PySpark, Python, Scala, and Spark-SQL.
Build complex data transformation logic and ensure data ingestion from source systems to Data Lakes (Hive, HBase, Parquet).
Produce unit tests for Spark transformations and helper methods.

Collaboration & Communication:

Work closely with Business Analysts to review test results and obtain sign-offs.
Prepare comprehensive design and operational documentation for future reference.

Code Quality & Review:

Conduct peer code reviews and act as a gatekeeper for quality checks.
Ensure quality and efficiency in the delivery of code through pair programming and collaboration.

Production Deployment:

Ensure smooth production deployments and perform post-deployment verification.

Technical Expertise:

Provide hands-on coding and support in a highly collaborative environment.
Contribute to development, automation, and continuous improvement practices.

System Knowledge:

Strong understanding of data structures, data manipulation, distributed processing, and application development.
Exposure to technologies like Kafka, Spark Streaming, and ML is a plus.

RDBMS & Database Management:

Hands-on experience with RDBMS technologies (MariaDB, SQL Server, MySQL, Oracle).
Knowledge of PLSQL and stored procedures is an added advantage.

Other Responsibilities:

Exposure to TWS jobs for scheduling.
Knowledge and experience in Hadoop tech stack, Cloudera Distribution, and CI/CD pipelines using Git, Jenkins.
Experience with Agile Methodologies and DevOps practices.

Technical Requirements:

Experience: 6-9.5 years of experience in Hadoop, Spark, PySpark, Scala, Hive, Spark-SQL, Python, Impala, CI/CD, and Git.
Strong understanding of Data Warehousing Methodology and Change Data Capture (CDC).
In-depth knowledge of Hadoop & Spark ecosystems with hands-on experience in PySpark and Hadoop technologies.
Proficiency in working with RDBMS such as MariaDB, SQL Server, MySQL, or Oracle.
Experience with stored procedures and TWS job scheduling.
Solid experience with Enterprise Data Architectures and Data Models.
Background in Core Banking or Finance domains is preferred experience in AML (Anti-Money Laundering) domain is a plus.

Skills & Qualifications:

Strong hands-on coding skills in Python, PySpark, Scala, Spark-SQL.
Proficient in Hadoop ecosystem (Hive, HBase, etc.).
Knowledge of CI/CD, Agile, and DevOps methodologies.
Good understanding of data integration, data pipelines, and distributed data systems.
Experience with Oracle, PLSQL, and working with large-scale databases.
Strong analytical and problem-solving skills, with an ability to troubleshoot complex data issues.