
Search by job, company or skills
Key Responsibilities & Scope of Work
A. Architecture Assessment & Strategic Roadmap
● Evaluate the current data engineering framework end-to-end: medallion architecture layering, naming conventions, ingestion patterns, processing logic, security controls, and data quality mechanisms.
● Benchmark the current state against industry best practices and produce a prioritized improvement roadmap with clear effort-vs-impact trade-offs.
B. Data Estate Governance
● Build and maintain a comprehensive inventory of the data estate — cataloging all source
systems (onboarded and prospective) and the subject areas each covers (ingested and
not yet ingested).
● Establish this inventory as a living artifact that informs onboarding decisions, coverage
analysis, and platform planning.
C. Standards Definition & Enforcement
● Design, integrate, or refactor naming conventions for schemas, tables, views, orchestration jobs, and pipelines — along with the migration approach for transitioning to new standards where needed. ● Define standardized ingestion and processing patterns spanning the full medallion architecture, including sub-layering strategy, format standardization (Parquet, Avro, Delta), secure PII ingestion, data normalization, technical data quality tracking, row- and column-level access controls, late-arriving dimension management, and data export workflows.
● Establish clear pattern selection criteria so engineers know which approach to apply for a given source type or use case.
● Define and operationalize the exception management process for handling justified deviations from established standards.
D. Hands-On Implementation
● Build production-grade boilerplate code for each standardized pattern using the existing GCP toolchain (BigQuery, CloudSQL,Cloud Composer, Dataflow, Dataproc, Cloud Storage, Pub/Sub, and related services).
● Ensure templates are modular, well-documented, and immediately adoptable by the engineering team.
E. CI/CD & Developer Experience
● Support the integration of data engineering pipelines with the CI/CD solution, aligning with the broader CI/CD modernization initiative's timeline and tooling decisions.
● Contribute to developer experience improvements that reduce friction in pipeline development, testing, and deployment.
F. Knowledge Transfer & Enablement
● Author the Source Onboarding Playbook — a repeatable, step-by-step guide for bringing new data sources into the platform, covering initial assessment, pattern Page 3 selection, naming convention application, quality gates, access control setup, and production release.
● Mentor and upskill data engineers on the new standards, patterns, and tooling through documentation, walkthroughs, and hands-on pairing.
Resource Requirements (What We're Looking For)
Must-Have
● Substantial progressive experience in data engineering, data architecture, or analytics platform development, with a significant portion spent in hands-on, code-level roles — not purely advisory or managerial positions.
● Deep, demonstrable expertise in designing and operating large-scale analytical solutions (data warehouses, data lakes, lakehouses) serving enterprise-grade workloads.
● Strong hands-on proficiency with GCP data services — BigQuery, CloudSQL(Federated Query), Cloud Composer (Airflow), Dataflow (Apache Beam), Dataproc (Spark), Cloud Storage, and Pub/Sub.
● Proven track record of implementing medallion architecture (Bronze/Silver/Gold) or equivalent layered data platform patterns at scale.
● Experience defining and enforcing data engineering standards, naming conventions, and governance frameworks across multiple teams and workstreams.
● Experience with dbt, Apache Iceberg, Delta Lake, or similar transformation and open table format technologies.
● Practical experience with PII handling, data masking, tokenization, and implementing row- and column-level security in cloud data platforms.
● Strong background in CI/CD for data pipelines (Terraform, Cloud Build, GitHub Actions, dbt, or equivalent).
● A track record of building reusable templates, frameworks, and boilerplate code that engineering teams actually adopt and rely on.
● Solid understanding of data quality frameworks, data contracts, and pipeline observability.
Job ID: 148686559
Skills:
amazon dynamodb , Pyspark, Aws Lambda, Amazon S3, AWS Glue, Tableau, AWS Data Pipeline, Amazon Redshift, Databricks, Amazon Rds, AWS, Informatica IDMC, Databricks Delta Lake, Informatica OAS
Skills:
Java, Apache Flink, PostgreSQL, Apache Spark, Kotlin, Sql, ELT, Git, Linux, Docker, Apache Kafka, Kubernetes, Etl, Apache Iceberg, Azure Data Lake Storage, OpenTelemetry, MinIO, Trino
Skills:
S3, Cloudformation, Emr, Sql, Terraform, Spark, Databricks, Kubernetes, Python, AWS, Airflow, IaC, SageMaker, Glue, Delta Lake
Skills:
data monitoring , Data Architecture, Kafka, Data Modeling, Artificial Intelligence, Mapreduce, Distributed Systems, Big Data, Advanced Analytics, Machine Learning, Yarn, Impala, Spark, vector graph databases, data-driven techniques, Flink, productionizing machine learning models, VLMs, agentic architectures, data mining applications, LLMs, data platforms, data warehouse design, cloud environments, data intensive applications, low-latency data streaming, open-source distributed computing
Skills:
snowflake , Hadoop, Cloudera, Emr, Redshift, Nobase, cloud vendor, Google BigQuery, Flink, Big Data architecture engineering, AMW
We don’t charge any money for job offers