Design and architect data storage solutions-including databases, data lakes, and data warehouses-using AWS services such as Amazon S3, Amazon RDS, Amazon Redshift, and Amazon DynamoDB, along with Databricks Delta Lake.
Integrate Informatica IDMC for metadata management and data cataloging.
Data Pipelines & Processing
Create, manage, and optimize data pipelines for ingestion, processing, and transformation using AWS services (AWS Glue, AWS Data Pipeline, AWS Lambda), Databricks for advanced data processing, and Informatica IDMC for integration and data quality.
Develop ETL processes to cleanse, transform, and enrich data for analytical use, leveraging Databricks Spark capabilities and Informatica IDMC.
Data Integration & Governance
Integrate data from internal and external sources into AWS and Databricks environments, ensuring consistency and quality.
Utilize Informatica IDMC for data integration, transformation, governance, and compliance with data privacy regulations.
Performance Monitoring & Optimization
Monitor and optimize performance of data processing and queries across AWS and Databricks environments.
Utilize Informatica IDMC for optimizing workflows and ensuring scalability and performance efficiency.
Security & Compliance
Implement security best practices and encryption methods to protect sensitive data across AWS and Databricks.
Leverage Informatica IDMC for data governance and compliance with privacy standards.
Automation
Implement automation for data ingestion, transformation, monitoring, and other routine tasks using AWS Step Functions, AWS Lambda, Databricks Jobs, and Informatica IDMC workflow automation.
Documentation
Maintain comprehensive documentation for data infrastructure, pipelines, and configurations within AWS and Databricks environments.
Ensure metadata management through Informatica IDMC.
Cross-Functional Collaboration
Collaborate with data scientists, analysts, and software engineers to understand data requirements and deliver end-to-end solutions across AWS, Databricks, and Informatica IDMC.
Issue Resolution & Support
Identify and resolve data-related issues to maintain data availability and integrity across AWS, Databricks, and Informatica environments.
Cost Optimization
Optimize resource usage across AWS, Databricks, and Informatica IDMC to control costs while meeting scalability and performance needs.
Continuous Improvement & Innovation
Stay up-to-date with AWS, Databricks, and Informatica IDMC features and data engineering best practices.
Recommend and implement new technologies and techniques as needed.
Requirements / Qualifications
Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
Minimum 5 years of experience in data engineering with hands-on expertise in AWS services, Databricks, and/or Informatica IDMC.
Proficiency in programming languages such as Python, Java, or Scala for pipeline development.
Ability to evaluate technical solutions and make recommendations to resolve data issues, particularly related to performance for complex transformations and long-running data processes.
Strong knowledge of SQL and NoSQL databases.
Familiarity with data modeling and schema design.
Excellent analytical, problem-solving, communication, and collaboration skills.
Certifications (AWS Data Analytics Specialty, Databricks, Informatica) are a strong plus.
Preferred Skills
Experience with big data technologies such as Apache Spark and Hadoop on Databricks.
Knowledge of containerization and orchestration tools like Docker and Kubernetes.
Familiarity with data visualization tools such as Tableau or Power BI.
Understanding of DevOps principles for managing and deploying data pipelines.
Experience with version control systems (Git) and CI/CD pipelines.
Knowledge of data governance and cataloguing tools, especially Informatica IDMC.