Key Responsibilities:
- 1. Design, deploy, and maintain HDFS clusters to support enterprise-scale data applications.
- Monitor cluster performance, manage storage capacity, and ensure high availability and fault tolerance.
- Implement data security, access controls, and encryption for HDFS data.
- Troubleshoot and resolve issues related to HDFS, including data node failures, replication issues, and performance bottlenecks.
- Manage data ingestion pipelines and optimize data storage formats (e.g. Parquet, Avro).
- Support and work with data engineering and analytics teams to ensure reliable data delivery and transformation workflows.
- Automate cluster operations using scripting (e.g., Bash, Python) and orchestration tools.
- Conduct upgrades and patching of Hadoop ecosystem components (HDFS, YARN, Hive, etc.).
- Maintain documentation of architecture, configurations, and best practices.
- Ensure compliance with data governance and data privacy policies.
Qualifications and Requirement:
- a) Bachelor's degree in Computer Science, Information Systems, or a related field.
- 3-5+ years of experience with Hadoop ecosystem, particularly HDFS administration.
- Strong understanding of HDFS architecture, replication, and fault tolerance.
- Experience with Cloudera, Hortonworks, or Apache Hadoop distributions.
- Proficiency in Linux/Unix system administration and scripting (Bash, Python, etc.).
- Familiarity with related components: YARN, Hive, HBase, Spark, Oozie, and Zookeeper.
- Experience with monitoring tools like Ambari, Cloudera Manager, or Nagios.
Advantage to have: -
- Hadoop certification (e.g., Cloudera Certified Administrator for Apache Hadoop - CCAH).
- Knowledge of cloud-based big data platforms (AWS EMR, Azure HDInsight, GCP Dataproc).
- Experience with containerization (Docker/Kubernetes) for big data workloads.
- Exposure to data lake architectures and data governance tools.