We are seeking a highly skilled and motivated Data Engineer to design, build, and maintain scalable data pipelines and infrastructure. This is a 12 months contract role to start with.
You will work closely with data analysts, data scientists, and software engineers to ensure data is reliable, available, and optimized for performance. The ideal candidate is passionate about data architecture, automation, and creating robust systems for large-scale data processing.
Responsibilities
- Design, develop, and maintain robust and scalable data pipelines using tools such as Apache Spark, Kafka, Airflow, or similar.
- Build and manage ETL/ELT processes to ingest data from various structured and unstructured sources.
- Optimize and monitor data workflows and pipeline performance.
- Create and manage data models and schemas in modern data warehouses (e.g., Snowflake, BigQuery, Redshift).
- Collaborate with stakeholders to define data requirements and ensure the availability and quality of data for analytics and reporting.
- Implement data governance, data quality, and data security best practices.
- Develop infrastructure as code (IaC) for deploying and managing data services in cloud environments (AWS, GCP, or Azure).
- Maintain and improve the performance, scalability, and availability of data platforms.
- Write clear and maintainable code, following software engineering best practices (version control, testing, CI/CD).
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 5+ years of experience in a data engineering or developer role.
- Strong programming skills in Python, Java, or Scala.
- Proficiency in SQL and experience with relational databases (e.g., PostgreSQL, MySQL).
- Experience with big data processing frameworks like Apache Spark, Flink, or Beam.
- Hands-on experience with workflow orchestration tools like Apache Airflow, Luigi, or Prefect.
- Familiarity with data warehousing platforms such as Snowflake, Amazon Redshift, Google BigQuery, or Databricks.
- Experience with cloud platforms (AWS/GCP/Azure), especially data services like S3, Lambda, Glue, Dataflow, BigQuery, etc.
- Solid understanding of data modeling concepts (OLAP, OLTP, Star/Snowflake schemas).
Preferred Qualifications
- Experience with real-time data streaming tools (Kafka, Kinesis, Pub/Sub).
- Familiarity with containerization and orchestration (Docker, Kubernetes).
- Knowledge of DevOps practices and CI/CD pipelines.
- Background in data security, compliance, and privacy (GDPR, HIPAA, etc.).
- Exposure to machine learning workflows or MLOps is a plus.
Tools & Technologies You Might Use
- Languages: Python, SQL, Java, Scala
- Platforms: AWS, GCP, Azure
- Data tools: Spark, Kafka, Airflow, dbt, Snowflake, Redshift, BigQuery
- Orchestration: Airflow, Prefect
- Infrastructure: Terraform, Docker, Kubernetes
- Version Control: Git, GitHub/GitLab
Argyll Scott Consulting Pte Ltd