
Search by job, company or skills
Job Description Senior Big Data Performance Engineer
Role Overview
We are seeking a highly experienced Senior Big Data Performance Engineer with 7+ years of hands-on expertise in large scale data platforms, Spark/Hadoop tuning, and enterprise-grade data pipelines. This role is responsible for conducting platform-wide performance analysis, optimizing AML data pipelines, and ensuring the stability, scalability, and efficiency of Cloudera-based applications and integrations.
Technology Stack Development & Tuning
Apache Spark (Scala, Python)
Spark performance optimization & troubleshooting
Hive query and table optimization
HBase data modeling & tuning
Scope of Work
Perform application-aware performance analysis for AML data pipelines.
Analyze and optimize pipelines running on Cloudera, Spark, Hive, HBase, JBoss, and MariaDB.
Tune Spark, Hive, and HBase jobs, queries, and tables for performance and scalability.
Review ETL job design and data pipeline architecture for efficiency, resilience, and scalability.
Identify misconfigurations or misuse causing performance degradation or recurring issues.
Assess and recommend tuning for Cloudera, Phoenix, YARN, and related platform configurations.
Support large-scale data ingestion, transformation, and downstream data delivery.
Participate in joint Root Cause Analysis (RCA) with internal and vendor teams.
Support performance benchmarking and validation of tuning recommendations.
Support application and system integrations with Cloudera and AML platforms.
Build and refine data validation dashboards and data growth monitoring tools.
Troubleshoot slow-running jobs, data skews, shuffle issues, GC challenges, YARN resource pressure, and cluster constraints.
Improve performance of API/UI components interacting with Big Data pipelines.
Deliverables
Formal assessment report covering pipeline, application, and platform interaction analysis.
Identified performance bottlenecks and inefficiencies.
Documented risks, constraints, and recurring issue patterns.
Tuning and optimization recommendations for Spark, Hive, HBase, and Cloudera/Phoenix configurations.
Corrective and preventive action plan categorized into immediate remediation, configuration changes, and long-term
improvements.
Performance benchmarking and validation results post-tuning.
RCA documentation for recurring application-impacting issues.
Governance-ready documentation aligned with CRQ and JIRA standards.
Job ID: 141756863