Overview
Build and maintain scalable data pipelines and platforms to support data-driven applications and analytics.
Key Responsibilities
- Design, develop and deploy data tables, views and marts in data warehouses, operational data store, data lake and data virtualization.
- Perform data extraction, cleaning, transformation, and flow.
- Design, build, launch and maintain efficient and reliable large-scale batch and real-time data pipelines with data processing frameworks.
- Integrate and collate data silos in a manner which is both scalable and compliant
- Collaborate with Product Manager, Data Architect, Business Analysts, Frontend Developers, Designers and Data Analyst to build scalable data-driven products.
- Be responsible for developing backend APIs & working on databases to support the applications.
- Work in an Agile Environment that practices Continuous Integration and Delivery.
- Work closely with fellow developers through pair programming and code review process.
Key Requirements
- Bachelor's degree in Computer Science, Information Systems, Engineering, or a related field
- Minimum 4 years of experience in data engineering, data platform development, or related roles
- Strong experience in data engineering and pipeline development
- Proficiency in databases, SQL, and data processing frameworks
- Experience with cloud data platforms
- Familiarity with Agile and CI/CD practices
- Familiar with GIS platforms: ArcGIS Server, PostGIS
- Able to use spatial Python libraries: GeoPandas, Shapely
- Build scalable geospatial data pipelines for ingestion, transformation, storage, and quality checks from multiple government sources.
- Implements data cataloging, metadata, lineage, and security optimizes storage and performance for GIS analytics (e.g., PostGIS, data lakes).
- Provides reliable data services and contracts to support downstream GIS analytics and front-end visualizations, collaborating closely with GIS and frontend teams.
- Data management: applies validation, cleansing, reprojection, metadata, and data lineage of geospatial data