
Search by job, company or skills
Responsibilities
Design, develop, and maintain efficient ETL pipelines for ingesting, processing, and transforming large-scale data from multiple sources (batch and near real-time).
Design and implement scalable data architectures and analytics pipelines, leveraging modern data platforms such as Databricks and Spark-based ecosystems.
Design and implement efficient data models (dimensional, normalized, and curated layers) to support analytics and operational use cases.
Define and enforce data quality, validation, and observability frameworks
Drive performance tuning and cost optimization across compute and storage layers.
Collaborate with engineering, platform, and product teams to operationalize data-driven insights within production environments.
Drive the exploration and adoption of AI/ML use cases, defining architecture, selecting best-fit tools and frameworks, and enabling scalable, production-grade data-driven intelligence across platforms.
Requirements
Strong background in analyzing large and complex datasets using distributed data processing frameworks such as Spark, Databricks, or similar platforms.
Experience designing and implementing data architectures,ETL/ELT pipelines, and scalable data processing solutions.
Experience working with Databricks ecosystem (Delta Lake, Spark SQL, Databricks workflows) is highly desirable.
Proficiency in SQL and Python for data processing and transformation.
Demonstrated ability to solve multidisciplinary, data-driven problems.
Strong understanding of data modeling, data warehousing concepts, and lakehouse architecture.
Experience with cloud platforms (AWS, Azure).
Experience with Kubernetes (K8s), containerized workloads, and microservices infrastructure is a plus.
Experience with designing and enabling secure, scalable data sharing using open standards (e.g., Delta Sharing) to support cross-organization data access is a plus
Experience building or supporting data pipelines for AI/ML use cases, including feature engineering, data preparation, and integration with tools such as MLflow, LLM frameworks, or vector databases is a plus.
Relevant certifications such as Databricks Certified Data Engineer Associate/Professional and Python Institute certifications (e.g., PCEP, PCAP) are a plus.
Ability to work independently and as part of a team.
Job ID: 146930575