Design and build scalable ETL/ELT pipelines for real-time streaming of exchange and on-chain data feeds (Binary / Native, SBE, WebSockets, FIX, gRPC, etc.).
Engineer high-performance storage and querying layers using ClickHouse (or equivalent columnar/time-series DBs) optimized for billions of rows of tick and order-book data.
Implement real-time data processing, transformation, and enrichment pipelines (Kafka / Kinesis / Flink or equivalent).
Ensure data quality, observability, backfilling, replay capabilities, and low-latency delivery to trading engines, quant researchers, and risk systems.
Own infrastructure-as-code, monitoring, alerting, and cost optimization across the full data platform.
Collaborate closely with quants, traders, and software engineers to translate trading and research needs into production data infrastructure.
Build and optimize data pipelines for feature engineering and dataset generation from high-frequency time-series data to support machine learning model training and evaluation.
Design real-time data serving layers (e.g., low-latency queries from ClickHouse) to power online inference and live AI-driven trading signals or risk models.
Implement data versioning, lineage, and quality controls to enable reliable, reproducible ML experimentation and production deployment.
Requirements
7+ years as a data engineer in a high-volume, low-latency environment (HFT/prop trading, fintech, or large-scale real-time analytics strongly preferred).
Deep, production-level expertise with ClickHouse (or similar: TimescaleDB, Druid, Pinot, etc.) for large-scale time-series workloads.
Proficiency with Python, SQL, and data orchestration tools (Airflow, Dagster, or equivalent).
Prior workbuilding data infrastructure for ML/AI (feature stores, model trainingpipelines, or real-time inference data layers).
Proven track record of owning end-to-end data platforms as a senior individual contributor or small-team lead - you've been the one-man band before and delivered results under pressure.
You need to be someone who can think on their feet, able to adjust individual priorities to deliver measurable impact to the team.
A genuine interest and passion for Agentic AI Development, Machine Learning, and building Autonomous Workflows.
Be reflective on constructive feedback, share knowledge openly, and be prepared to both learn and unlearn, whilst contributing to transparent decision-making.
Your decision making is evidence based, using data to evaluate success and improve efficiency.