Job Description
We are looking for a skilled Data Engineer with strong hands-on experience in PySpark, Python, Big Data, and performance tuning. The role involves building and optimizing scalable data pipelines, collaborating with business users and technical stakeholders, and ensuring high-quality data delivery in a distributed environment.
Responsibilities
- Develop and optimize ETL/ELT pipelines using PySpark and Python
- Work directly with business users and cross-functional teams
- Ensure performance, reliability, and scalability of data workflows
- Apply debugging and performance-tuning techniques
- Use Git, CI/CD, and Agile practices
- Support data modeling and PySpark code migration efforts
Qualifications
- 4+ years of experience in Data Engineering/PySpark
- Strong skills in PySpark, Python, SQL, and Big Data frameworks
- Good understanding of distributed computing (RDDs, DataFrames, partitions)
- Experience with Git, CI/CD, Agile
- Good to have: data modeling, code migration, cloud experience.
EA Number: 11C4879