
Search by job, company or skills
The Databricks Engineer is responsible for designing, implementing, and optimizing big data solutions using Databricks and Apache Spark. This role involves developing scalable ETL pipelines, data lakes, and analytics solutions on cloud platforms such as AWS, Azure, or GCP. The engineer works closely with data scientists, analysts, and software engineers to enable high-performance data processing and machine learning workflows.
Key Responsibilities:
1. Databricks Development & Implementation:
Design and develop scalable data processing pipelines using Databricks, Apache Spark, and Delta Lake.
Optimize ETL jobs, batch processing, and real-time streaming workloads.
Implement data ingestion strategies using Kafka, IDMC etc.
Develop SQL-based transformations and data models in Databricks.
2. Cloud Data Architecture & Integration:
Design and implement data lakehouse architectures on AWS.
Integrate Databricks with cloud storage (S3) and databases (Databricks, Redshift, BigQuery).
Work with Terraform, CloudFormation, to automate Databricks deployments.
Implement CI/CD pipelines for Databricks notebooks using GitHub Actions.
3. Performance Optimization & Troubleshooting:
Optimize Spark jobs, cluster configurations, and query performance.
Monitor and debug Databricks jobs, workflows, and runtime errors.
Tune Delta Lake tables for efficient data processing and storage.
4. Security & Data Governance:
Implement RBAC (Role-Based Access Control), Unity Catalog, and data masking.
Ensure compliance with GDPR, HIPAA, and SOC2 regulations.
Manage IAM roles, permissions, and encryption settings for Databricks environments.
5. Collaboration & Support:
Work closely with data scientists, analysts, and DevOps teams to enable advanced analytics and machine learning workloads.
Provide technical guidance and best practices on Databricks development.
Document technical designs, processes, and troubleshooting guides.
Required Skills & Qualifications:
Technical Skills:
Strong experience with Databricks, Apache Spark (PySpark, Scala), and Delta Lake.
Proficiency in Python, SQL, and Scala for data processing.
Experience with cloud platforms (AWS, Azure, or GCP) and data services.
Hands-on knowledge of ETL, data warehousing, and lakehouse architectures.
Familiarity with Airflow, dbt, or similar workflow orchestration tools.
Knowledge of machine learning frameworks (MLflow, TensorFlow, PyTorch) is a plus.
Soft Skills:
Strong problem-solving and analytical skills.
Ability to work independently and within cross-functional teams.
Excellent communication and documentation skills.
Ability to manage multiple projects and prioritize tasks effectively.
Job ID: 143487085