Key Responsibilities
- Design, build, and maintain scalable data pipelines, ETL/ELT workflows, and data integration processes.
- Develop and optimize SQL queries, data models, and data structures for analytics and operational use cases.
- Work with diverse data storage systems, including relational, NoSQL, and distributed file systems.
- Implement data processing solutions using distributed computing frameworks such as Apache Spark or Hadoop.
- Collaborate with crossfunctional teams to deliver cloudnative data engineering solutions on AWS or Azure.
- Develop automated workflows using orchestration tools such as Apache Airflow or Azure Data Factory.
- Apply DevOps practices including CI/CD automation, IaC, and containerization.
- Contribute to the design of resilient, scalable, and secure data architecture.
Requirements
Experience: Minimum 3 years of professional experience in data engineering, including at least 2 years working with cloudnative data services.
Programming: Proficiency in at least one programming language such as Python, Java, or .NET.
Data Fundamentals:
- Strong SQL expertise in relational databases and data warehouses.
- Solid understanding of data modeling, data structures, and access patterns.
- Handson experience with relational databases (e.g., PostgreSQL, MySQL), NoSQL systems (e.g., DynamoDB, CosmosDB), and distributed storage.
Cloud Platforms: Practical experience with major cloud services (AWS or Azure), such as:
- AWS: S3, RDS/Aurora, EMR, Glue, Athena, Redshift, Lambda
- Azure: Data Lake Storage, Azure SQL, CosmosDB, Data Factory, Synapse
Data Processing: Experience using frameworks like PySpark, Apache Spark, Hadoop, and libraries such as Pandas.
Orchestration: Familiarity with Airflow, Azure Data Factory, or similar tools.
DevOps & Governance:
- Experience with CI/CD pipeline development.
- Knowledge of IaC tools such as Git, Docker, and Terraform.
- Understanding of system design principles for data platforms.