Search by job, company or skills

Exasoft

Data Engineer (Performance Tuning, ETL, Hive, Cloudera/Spark)

7-9 Years

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago

Job Description

Job Description Senior Big Data Performance Engineer

Role Overview

We are seeking a highly experienced Senior Big Data Performance Engineer with 7+ years of hands-on expertise in large scale data platforms, Spark/Hadoop tuning, and enterprise-grade data pipelines. This role is responsible for conducting platform-wide performance analysis, optimizing AML data pipelines, and ensuring the stability, scalability, and efficiency of Cloudera-based applications and integrations.

Technology Stack Development & Tuning

Apache Spark (Scala, Python)

Spark performance optimization & troubleshooting

Hive query and table optimization

HBase data modeling & tuning

Scope of Work

Perform application-aware performance analysis for AML data pipelines.

Analyze and optimize pipelines running on Cloudera, Spark, Hive, HBase, JBoss, and MariaDB.

Tune Spark, Hive, and HBase jobs, queries, and tables for performance and scalability.

Review ETL job design and data pipeline architecture for efficiency, resilience, and scalability.

Identify misconfigurations or misuse causing performance degradation or recurring issues.

Assess and recommend tuning for Cloudera, Phoenix, YARN, and related platform configurations.

Support large-scale data ingestion, transformation, and downstream data delivery.

Participate in joint Root Cause Analysis (RCA) with internal and vendor teams.

Support performance benchmarking and validation of tuning recommendations.

Support application and system integrations with Cloudera and AML platforms.

Build and refine data validation dashboards and data growth monitoring tools.

Troubleshoot slow-running jobs, data skews, shuffle issues, GC challenges, YARN resource pressure, and cluster constraints.

Improve performance of API/UI components interacting with Big Data pipelines.

Deliverables

Formal assessment report covering pipeline, application, and platform interaction analysis.

Identified performance bottlenecks and inefficiencies.

Documented risks, constraints, and recurring issue patterns.

Tuning and optimization recommendations for Spark, Hive, HBase, and Cloudera/Phoenix configurations.

Corrective and preventive action plan categorized into immediate remediation, configuration changes, and long-term

improvements.

Performance benchmarking and validation results post-tuning.

RCA documentation for recurring application-impacting issues.

Governance-ready documentation aligned with CRQ and JIRA standards.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 141756863