We are seeking an experienced Data Engineer with a minimum of 7 years of hands-on expertise in Big Data technologies, specifically Hadoop, Python, and Spark. The ideal candidate will play a critical role in designing, building, and maintaining scalable data processing systems that enable advanced analytics and business intelligence across the organization.
Responsibilities
- Design, implement, and optimize large-scale data pipelines using Hadoop, Spark, and related Big Data frameworks.
- Develop robust ETL processes to ingest, transform, and store structured and unstructured data from multiple sources.
- Write efficient, reusable, and well-documented code in Python for data processing and automation tasks.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable solutions.
- Monitor, troubleshoot, and enhance the performance of data systems to ensure reliability and scalability.
- Ensure data quality, integrity, and security throughout the data lifecycle.
- Stay up to date with emerging Big Data technologies and best practices to continuously improve the data platform.
Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field.
- Minimum of 7 years of professional experience in data engineering, with a strong focus on Hadoop and Spark ecosystems.
- Advanced proficiency in Python programming for data manipulation and automation.
- Demonstrated expertise in designing and managing large-scale distributed data systems.
- Hands-on experience with data modeling, ETL development, and data warehousing concepts.
- Proficient in SQL and experience with NoSQL databases is a plus.
- Strong problem-solving skills and the ability to work independently or as part of a team.
- Excellent communication and interpersonal skills.
Preferred Skills
- Experience with cloud-based data platforms (e.g., AWS, Azure, GCP).
- Familiarity with containerization technologies such as Docker and orchestration tools like Kubernetes.
- Knowledge of data governance, data privacy, and data security best practices.
- Exposure to real-time data processing frameworks (e.g., Kafka, Flink) is a plus.