You own the storage layer and cloud infrastructure provisioning. You design the multi-tenant data architecture that makes IP isolation possible and auditable: shared public data stores for market data, and per-tenant isolated stores for proprietary data. Every table, bucket, and index is either explicitly shared or explicitly tenant-scoped. You provision all infrastructure as code and ensure that onboarding a new tenant is a scripted operation, not a manual one.
Responsibilities
- Provision all cloud infrastructure via Terraform: object storage, vector databases, event streaming, Kubernetes, time-series databases, authentication. All reproducible for new tenants.
- Design multi-tenant storage: shared vector indices for public data, per-tenant indices for proprietary data. Row-level security or schema-level isolation.
- Design per-tenant storage structure with bucket policies enforcing isolation.
- Build market data storage pipeline: exchange feeds → event bus → time-series database.
- Build monitoring dashboards for data pipeline health across all data sources.
- Design feedback data storage: per-tenant schema for feedback events and training data candidates.
- Build data archival pipelines for cost-efficient long-term storage.
- Automate tenant provisioning: a script that creates a new tenant's storage, network policies, and service accounts.
Requirements
- 4+ years data engineering strong SQL, Python, and cloud infrastructure.
- Experience designing multi-tenant data architectures with isolation requirements.
- Infrastructure as Code: Terraform or Pulumi - mandatory.
- PostgreSQL experience (vector extensions, partitioning, row-level security a plus).
- Kafka consumer/producer development.
- Time-series data storage and querying experience.
Nice to have
- Experience with financial data: time-series, tick data, on-chain events.
- Understanding of data sovereignty and compliance requirements.
- Experience with tenant provisioning automation.
- Blockchain or crypto data pipeline experience.