We are looking for an experienced and highly skilled Hadoop Data Engineer to join our dynamic team. The ideal candidate will have hands-on expertise in developing optimized data pipelines using Python, PySpark, Scala, Spark-SQL, Hive, and other big data technologies. You will be responsible for translating complex business and technical requirements into efficient data pipelines and ensuring high-quality code delivery through collaboration and code reviews.
Roles & Responsibilities:
Data Transformation & Pipeline Development:
- Design and implement optimized data pipelines using PySpark, Python, Scala, and Spark-SQL.
- Build complex data transformation logic and ensure data ingestion from source systems to Data Lakes (Hive, HBase, Parquet).
- Produce unit tests for Spark transformations and helper methods.
Collaboration & Communication:
- Work closely with Business Analysts to review test results and obtain sign-offs.
- Prepare comprehensive design and operational documentation for future reference.
Code Quality & Review:
- Conduct peer code reviews and act as a gatekeeper for quality checks.
- Ensure quality and efficiency in the delivery of code through pair programming and collaboration.
Production Deployment:
- Ensure smooth production deployments and perform post-deployment verification.
Technical Expertise:
- Provide hands-on coding and support in a highly collaborative environment.
- Contribute to development, automation, and continuous improvement practices.
System Knowledge:
- Strong understanding of data structures, data manipulation, distributed processing, and application development.
- Exposure to technologies like Kafka, Spark Streaming, and ML is a plus.
RDBMS & Database Management:
- Hands-on experience with RDBMS technologies (MariaDB, SQL Server, MySQL, Oracle).
- Knowledge of PLSQL and stored procedures is an added advantage.
Other Responsibilities:
- Exposure to TWS jobs for scheduling.
- Knowledge and experience in Hadoop tech stack, Cloudera Distribution, and CI/CD pipelines using Git, Jenkins.
- Experience with Agile Methodologies and DevOps practices.
Technical Requirements:
- Experience: 6-9.5 years of experience in Hadoop, Spark, PySpark, Scala, Hive, Spark-SQL, Python, Impala, CI/CD, and Git.
- Strong understanding of Data Warehousing Methodology and Change Data Capture (CDC).
- In-depth knowledge of Hadoop & Spark ecosystems with hands-on experience in PySpark and Hadoop technologies.
- Proficiency in working with RDBMS such as MariaDB, SQL Server, MySQL, or Oracle.
- Experience with stored procedures and TWS job scheduling.
- Solid experience with Enterprise Data Architectures and Data Models.
- Background in Core Banking or Finance domains is preferred experience in AML (Anti-Money Laundering) domain is a plus.
Skills & Qualifications:
- Strong hands-on coding skills in Python, PySpark, Scala, Spark-SQL.
- Proficient in Hadoop ecosystem (Hive, HBase, etc.).
- Knowledge of CI/CD, Agile, and DevOps methodologies.
- Good understanding of data integration, data pipelines, and distributed data systems.
- Experience with Oracle, PLSQL, and working with large-scale databases.
- Strong analytical and problem-solving skills, with an ability to troubleshoot complex data issues.