As a Senior Data Engineer, you will design, build, and optimize high-performance data pipelines and platforms powering analytics, dashboards, and AI models across the enterprise. Your mission is to deliver accessible, reliable, and production-ready datafreeing Data Scientists and Analysts from manual engineering. You will champion automation, scalability, and best practices that accelerate the company's data and AI maturity.
Key Responsibilities
1. Data Pipeline Engineering & Automation
- Design, build, and maintain scalable, end-to-end pipelines for data ingestion, transformation, and delivery.
- Automate ETL/ELT workflows (Airflow, Glue, Step Functions, Prefect) to eliminate manual intervention and improve reliability.
- Implement validation, version control, and rollback mechanisms for reliability and traceability.
- Build self-healing, auto-scaling pipelines ensuring near-zero downtime and operational resilience.
2. Data Infrastructure & Performance Optimization
- Develop and optimize lakehouse and warehouse architectures using Databricks, Snowflake, Redshift, S3, EMR, Glue, and Lake Formation.
- Apply best practices in data partitioning, indexing, and caching to improve query speed and control compute costs.
- Integrate monitoring, alerting, and logging (CloudWatch, Prometheus, Grafana) for proactive issue resolution.
- Collaborate with the Data Architect to ensure scalability, efficiency, and alignment with enterprise standards.
3. AI & Analytics Enablement
- Build data foundations for forecasting, segmentation, retention, and KPI decomposition models.
- Partner with Data Scientists to develop model-serving pipelines with automated retraining and versioning.
- Create reusable feature stores, model registries, and tracking frameworks supporting the full MLOps lifecycle.
- Enable AI-assisted analytics through natural language query, LLM integration, and automated insights.
4. Data Quality, Governance & Documentation
- Maintain detailed documentation of pipelines, lineage, and metadata.
- Enforce access control, encryption, and compliance with PDPA, GDPR, and internal governance.
- Develop automated quality checks, anomaly detection, audit trails to ensure trust in data.
- Deliver data that is ready for consumptionwithout revalidation or major manual cleanup.
5. Business Collaboration & Delivery
- Partner with cross-functional teams (Product, DS&A, Engineering) to ensure data readiness aligns with business timelines.
- Build reusable data assets supporting recurring analytics (marketing funnel, retention, revenue, segmentation).
- Translate analytical and AI use cases into resilient data engineering workflows that deliver measurable value.
6. Innovation & Continuous Improvement
- Implement CI/CD for pipelines, Infrastructure-as-Code, and containerized ETL.
- Evaluate emerging technologies to enhance performance, automation, and observability.
- Champion modular design, code reusability, and reliability as team-wide standards.
Qualifications
- Bachelor's/Master's in Computer Science, Information Systems, or related field.
- 6+ years in data engineering, pipeline design, or infrastructure operations.
- Proven experience managing large-scale (multi-terabyte) datasets with high uptime.
- Expert in SQL, Python, and frameworks such as Spark, Hadoop, dbt, and Airflow.
- Strong knowledge of AWS stack (Redshift, Glue, S3, EMR, Athena, Lambda, Lake Formation).
- Familiar with Databricks, Snowflake, and MLOps tools (SageMaker, MLflow, Vertex AI).
- Skilled in data modelling, performance tuning, and cost optimization.
- Understanding of governance, PDPA/GDPR, and data security.
- AWS Certified Data Engineer / Solutions Architect or equivalent preferred.