Your Profile
Our client is seeking a skilled and motivated Data Engineer with expertise in Hadoop, Spark, OpenShift Container Platform (OCP), and DevOps practices. As a Data Engineer, you will be responsible for designing, developing, and maintaining efficient data pipelines, processing large-scale datasets.
Responsibilities:
- Implement data transformation, aggregation, and enrichment processes to support various data analytics and machine learning initiatives
- Collaborate with cross-functional teams to understand data requirements and translate them into effective data engineering solutions
- Design and deploy data engineering solutions on OpenShift Container Platform (OCP) using containerization and orchestration techniques
- Optimize data engineering workflows for containerized deployment and efficient resource utilization
- Collaborate with DevOps teams to streamline deployment processes, implement CI/CD pipelines, and ensure platform stability
- Implement data governance practices, data lineage, and metadata management to ensure data accuracy, traceability, and compliance
- Monitor and optimize data pipeline performance, troubleshoot issues, and implement necessary enhancements
- Implement monitoring and logging mechanisms to ensure the health, availability, and performance of the data infrastructure
- Document data engineering processes, workflows, and infrastructure configurations for knowledge sharing and reference
- Provide technical leadership, mentorship, and guidance to junior team members
Requirements:
- At least 6 years of experience as a Data Engineer, working with Hadoop, Spark, and data processing technologies in large-scale environments
- Strong expertise in designing and developing data infrastructure using Hadoop, Spark, and related tools (HDFS, Hive, Pig, etc)
- Experience with containerization platforms such as OpenShift Container Platform (OCP) and container orchestration using Kubernetes
- Proficiency in programming languages commonly used in data engineering, such as Spark, Python, Scala, or Java
- Knowledge of DevOps practices, CI/CD pipelines, and infrastructure automation tools (e.g., Docker, Jenkins, Ansible, BitBucket)
- Experience with jobs schedulers like Control-m
- Experience with Graphana, Prometheus, Splunk will be an added benefit
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services is a plus
- Key Skills: Scala, Python, Spark, Java, SQL, Shell Script, Java, Hadoop / Cloudera, Elastic Search, Read Heat OCP
We welcome interested applicants to submit their applications via Apply Now. Kindly note that only shortlisted candidates will be notified.