- 1-year contract, renewable
- Government project
- Hybrid work arrangement
RESPONSIBILITY
- Design, develop, and maintain data pipelines that extract data from various sources and formats, transform it according to business requirements, and load it into target systems.
- Perform data extraction, cleaning, transformation, and flow.
- Design, build, launch and maintain efficient and reliable large-scale batch and real-time data pipelines with data processing frameworks.
- Integrate and collate data silos in a manner which is both scalable and compliant.
- Collaborate with the Project Manager, Data Architect, Business Analysts, Frontend Developers, Designers and Data Analysts to build scalable data driven products.
- Work in an Agile Environment that practices Continuous Integration and Delivery.
- Work closely with fellow developers through pair programming and code review process.
EXPERIENCE AND SKILLS NEEDED
- Bachelor's degree in Computer Science, Software Engineering, or related field.
- At least 3-5 years experience in ETL/data integration projects.
- Proficient in general data cleaning and transformation using scripting languages (mandatory: SQL, Python added advantages: R, etc) to ensure data accuracy and consistency. Knowledge in R will be an advantage.
- Proficient in building ETL pipeline (mandatory: SQL Server Integration Services SSIS, Python, Snowflake added advantages: AWS Lambda, ECS Container task, Eventbridge, AWS Glue, Spring, etc). Proven hands-on experience with Microsoft SSIS and Snowflake.
- Proficient in database design and various databases (mandatory: SQL, AWS S3, RDS added advantages: PostgreSQL, Athena, MongoDB, Postgres/GIS, MySQL, SQLite, VoltDB, Apache Cassandra, etc).
- Experience in cloud technologies such as GCC and GCC+ (i.e. AWS, Azure).
- Experience in and passion for data engineering in a big data environment using Cloud platforms such as GCC and GCC+ (i.e. AWS, Azure).
- Experience in building production-grade data pipelines, ETL/ELT data integration.
- Experience in CI/CD pipelines and DevOps tools (e.g. GitLab).
- Experience in automated provisioning tools (Ansible, Terraform, Puppet, Vagrant) will be an advantage.
- Familiar with data modelling, data access, and data storage infrastructure like Data Mart, Data Lake, Data Virtualisation and Data Warehouse for efficient storage and retrieval.
- Familiar with REST API and web requests/protocols in general.
- Familiar with data governance policies, access control and security best practices.
- Knowledge of system design, data structure and algorithms.
- Knowledge of AI/ML RAG (Retrieval-Augmented Generation), MCP (Model Context Protocol) concepts.
- Comfortable in both Windows and Linux development environments.
- Interest in being the bridge between engineering and analytics.