Roles & Responsibilities
- Design, develop, and maintain ETL/ELT pipelines for ingesting data from various sources like internal systems, emails, and web sources.
- Ensure high-quality, reliable data is available for analytics and machine learning initiatives.
- Optimize data infrastructure for cost-efficiency and performance in cloud environments.
- Implement data quality checks, validation rules, and monitoring systems to ensure data accuracy.
- Optimize data warehouse performance through strategies like partitioning and clustering.
- Collaborate with data scientists to productionize preprocessing logic and support feature engineering.
- Automate workflow orchestration using tools such as Airflow or Luigi and maintain comprehensive documentation.
- Establish data governance and quality assurance frameworks.
Requirements
- Bachelor's degree or above in Computer Science, Data Engineering, or a related field.
- 1-3 years of experience in data engineering roles. Experience with streaming data processing (PySpark, Dataflow) is an advantage.
- Strong Python and SQL programming skills for large-scale data processing.
- Hands-on experience with cloud data warehouses such as BigQuery, Redshift, or Snowflake is essential.
- Proficiency in ETL/ELT frameworks and orchestration tools like Airflow, Luigi, or Cloud Composer.
- Knowledge of data governance, quality assurance, and validation frameworks
- Solid foundation in data structures, algorithms, and distributed computing principles.
- Strong understanding of data modeling, warehousing concepts, and optimization technique.
- Excellent communication skills and ability to translate between business requirements and technical solutions.
- Strong problem-solving skills with a focus on data quality and reliability.
We regret that only shortlisted candidates will be notified. Thank you.