About the Role
We are looking for a Data Engineer to design, build, and maintain scalable data pipelines that support the Medical Datastore and enterprise analytics initiatives.
In this role, you will develop robust ETL workflows to ingest and transform data from multiple healthcare and operational systems, enabling downstream reporting, analytics, and decision-making.
You will work closely with data analysts, business stakeholders, and technical teams to ensure data is accurate, timely, and reliable.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines to ingest data from source systems such as NGEMR, MSI, and OSIT.
- Build and manage data workflows using AWS Glue, Amazon EMR, and other AWS services.
- Develop and optimize data models and transformations for the Medical Datastore.
- Maintain and enhance data pipelines to ensure performance, scalability, and reliability.
- Monitor data jobs, troubleshoot issues, and implement fixes to ensure smooth operations.
- Perform data validation and quality checks to ensure data accuracy and completeness.
- Optimize SQL queries and ETL logic for performance and cost efficiency.
- Collaborate with business and analytics teams to understand data requirements and deliver fit-for-purpose datasets.
- Document data flows, transformation logic, and technical specifications.
- Support production deployments and ongoing maintenance of data engineering solutions.
Required Skills & Experience
- 3+ years of experience in data engineering or ETL development.
- Strong hands-on experience with AWS Glue, Amazon EMR, Amazon Redshift, and Amazon S3.
- Proficiency in SQL and Python for data transformation and automation.
- Experience building and maintaining large-scale ETL/ELT pipelines.
- Strong understanding of data warehousing concepts and dimensional modelling.
- Experience working with structured and semi-structured data.
- Familiarity with data quality, validation, and monitoring practices.
- Ability to troubleshoot and optimize data pipelines and SQL queries.
Preferred Qualifications
- Experience working with healthcare or medical data platforms.
- Knowledge of AWS IAM, CloudWatch, and Step Functions.
- Familiarity with DevOps practices and CI/CD for data pipelines.
- Experience with Apache Spark or PySpark.
- Understanding of data governance and security best practices.
Tech Stack
- AWS Glue
- Amazon EMR
- Amazon Redshift
- Amazon S3
- Python
- SQL
- PySpark
Why Join Us
- Opportunity to work on a large-scale healthcare data platform.
- Build data solutions that support analytics and operational decision-making.
- Collaborate with experienced engineers and domain experts.
- Exposure to modern cloud-based data engineering technologies.