Roles & Responsibilities
We are seeking a highly skilled Data Architect with deep expertise in Kubernetes (on-premises & Azure AKS) and Apache Spark to lead the design, deployment, and optimization of a scalable hybrid big data platform. The successful candidate will play a key role in setting up the GDP (Global Data Platform), enabling secure and efficient data management, observability, and integration with enterprise systems. This role requires close collaboration with DevOps, Security, and Data Engineering teams.
Key Responsibilities:
- Architect and deploy Apache Spark, Apache Kafka, and Apache Ranger on Kubernetes clusters (on-prem and Azure AKS).
- Lead the setup of the GDP Platform on hybrid infrastructure.
- Automate deployments using Azure DevOps (ADO) pipelines or recommended CI/CD tooling.
- Configure and optimize Azure Kubernetes Services (AKS) for scalable and resilient operations.
- Integrate Azure Data Lake Storage Gen2 (ADLS G2) for hot and cold storage architecture using S3 protocols.
- Implement data encryption at rest using Transparent Data Encryption (TDE) or equivalent technologies.
- Enable TLS encryption for all intra- and inter-cluster communications.
- Configure and enforce Role-Based Access Control (RBAC) using Apache Ranger.
- Coordinate with the central security team for platform control assessments and compliance readiness.
- Support tokenization of personal data using Protegrity.
- Integrate platform observability with the organization's Central Observability Platform (COP).
- Design and maintain Grafana dashboards for monitoring Spark jobs and Kubernetes pods.
- Design tools and frameworks for transferring 5 PB of data between logical partitions in existing Hadoop clusters.
- Lead the migration of 1 PB+ data from Azure HDInsight (ADLS G2) to GDP on AKS (ADLS G2).
- Develop and maintain backup and disaster recovery strategies for both ADLS and Isilon storage environments.
- Ensure high availability and failover strategies are in place and tested.
Required Qualifications:
- Proven experience with Kubernetes (on-prem and Azure AKS) and Apache Spark in production environments.
- Strong understanding of big data ecosystems including Kafka, Ranger, HDFS/ADLS, and Hadoop.
- Hands-on experience with Azure DevOps, CI/CD pipelines, and infrastructure automation.
- Solid knowledge of data encryption, tokenization, and security compliance frameworks.
- Experience in setting up observability and monitoring tools such as Grafana, Prometheus, or similar.
- Background in large-scale data migration projects (petabyte scale).
- Proficiency with scripting and automation tools (e.g., Bash, Python, Terraform, Helm).
Preferred Qualifications:
- Experience with financial services or regulated enterprise environments.
- Familiarity with Protegrity or other data protection/tokenization tools.
- Strong stakeholder engagement skills and ability to work across infrastructure, data engineering, and security teams.