
Search by job, company or skills
Job Description
We are looking for a Lead Data Engineer specializing in AWS, Databricks, and Informatica IDMC to design, build, and maintain a robust, integrated, and governed data infrastructure. You will enable data-driven decision-making while ensuring high-quality, secure, and compliant data management.
Key Responsibilities:
Design and architect data storage solutions (databases, data lakes, data warehouses) using AWS services (S3, RDS, Redshift, DynamoDB) and Databricks Delta Lake. Integrate Informatica IDMC for metadata management and data cataloging.
Develop, manage, and optimize data pipelines for data ingestion, processing, and transformation using AWS (Glue, Data Pipeline, Lambda), Databricks, and Informatica IDMC.
Integrate and transform data from internal and external sources while ensuring data consistency, quality, and governance.
Build ETL processes to cleanse, enrich, and prepare data for analytics using Spark (Databricks) and Informatica IDMC.
Monitor, optimize, and troubleshoot data processing and queries for performance, scalability, and cost efficiency.
Implement data security best practices and comply with data privacy regulations.
Automate routine workflows for ingestion, transformation, and monitoring using AWS, Databricks Jobs, and Informatica IDMC.
Maintain clear documentation of infrastructure, pipelines, and configurations with metadata management.
Collaborate with cross-functional teams (data scientists, analysts, software engineers) to deliver reliable data solutions.
Stay updated on AWS, Databricks, Informatica IDMC, and data engineering best practices.
Requirements:
Bachelor's or master's degree in computer science, Data Engineering, or related field.
Minimum 10 years of data engineering experience, including AWS, Databricks, and Informatica IDMC.
Proficiency in Python, Java, or Scala for building data pipelines.
Strong knowledge of SQL and NoSQL databases data modeling and schema design.
Experience with ETL/ELT processes, data integration, and performance optimization.
Strong analytical, problem-solving, and communication skills.
AWS, Databricks, and/or Informatica certifications are a plus.
Preferred Skills:
PySpark experience on Databricks
Knowledge of data governance and cataloging tools, especially Informatica IDMC
Familiarity with Tableau or other data visualization tools
Experience with containerization (Docker) and orchestration (Kubernetes)
Understanding of DevOps principles for CI/CD in data pipelines
Experience with Git or other version control systems
Job ID: 138343875