
Search by job, company or skills
Responsibilities:
Pipeline Development& Orchestration
- Design, build, and maintain data pipelines using Python, SQL, and orchestration tools
- Develop and manage Directed Acyclic Graph (DAGs) / flows in using orchestration tools like Apache Airflow and Prefect
- Ensure pipelines are idempotent, scalable, and fault-tolerant
- Implement logging, monitoring, and alerting for pipeline observability
Package &Dependency Management
- Install, upgrade, and manage Python packages in controlled environments
- Maintain e.g.requirements.txt / dependency manifests with version pinning
- Resolve dependency conflicts and ensure compatibility across environments (dev, UAT, prod)
- Support deployments in restricted or air-gapped environments where require
Security Remediation& Library Fixes
- Analyse vulnerability reports from security scanning tools (e.g., CVE findings)
- Upgrade or replace vulnerable libraries while maintaining pipeline stability
- Fix broken imports, deprecated APIs, and compatibility issues arising from library updates
- Collaborate with security teams to ensure compliance with organisational standards
Code Refactoring &Optimization
- Refactor legacy code across:
. Data ingestion APIs
. Data transformation(Pandas/SQL)
. Model training and inference pipelines
. Orchestration workflows
- Improve code modularity, readability, and performance
- Ensure backward compatibility and minimal disruption to production systems
Data Processing &Integration
- Perform data transformation and validation using Pandas and SQL
- Integrate streaming data pipelines using Kafka (producers/consumers)
- Ensure schema consistency and data quality across pipeline stages
Testing, Deployment& Support
- Implement unit and integration tests for pipelines
- Support workflows for deployment of data pipelines
- Troubleshoot pipeline failures and perform root cause analysis
- Provide production support and continuous improvement of data workflows
Streaming and Integration Skills
- Working knowledge of Kafka (topics, partitions, consumers, producers)
- Experience handling schema evolution and message serialization/deserialization
Platform Awareness Skills
- Working knowledge of Kafka (topics, partitions, consumers, producers)
- Experience handling schema evolution and message serialization/deserialization
Security &Reliability
- Experience resolving vulnerabilities from security scans
- Understanding of secure coding practices
- Experience working in regulated or high-security environments
Core Mandatory Skills:
- Strong Python programming (modular design, error handling, logging)
- Advanced SQL (joins, window functions, optimization)
- Hands-on experience with Pandas and Kafka for data processing
- Experience with orchestration tools:- Apache Airflow
- Prefect (or equivalent)
- Experience with package and dependency management (pip, virtual environments)
Experience Requirements:
- Preferably at least2-3 or more years of experience in data engineering
- Prior experience working with production data pipelines
- Prior experience handling dependency conflicts, library upgrades, and refactoring in live systems
- Ability to work across multiple layers (API / data / orchestration / ML)
Interested candidates may send their CV to MAC (Reg No. R1221300) [Confidential Information] quoting the job title in the Subject line. We regret that only shortlisted candidates will be notified.
Job ID: 147057407