Create and manage a single master record for each business entity to ensure data consistency, accuracy, and reliability.
Implement data governance processes, including quality management, profiling, remediation, and automated data lineage.
Build and maintain robust, high-performance data processing pipelines across cloud, private data centers, and hybrid ecosystems.
Assemble and process large, complex datasets from diverse data sources.
Collaborate with Data Scientists, ML Engineers, Business Analysts, and other stakeholders to deliver actionable insights and improve business performance.
Develop, deploy, and maintain microservices, REST APIs, and reporting services.
Design and automate internal processes to streamline workflows, optimize data delivery, and scale infrastructure.
Troubleshoot and analyze large-scale distributed systems to ensure reliability and performance.
Work closely with cross-functional teams in a dynamic and fast-paced environment.
Key Requirements
Proven experience building and operating large-scale data lakes and data warehouses.
Strong knowledge of Hadoop ecosystem and big data tools, including Spark and Kafka.
Hands-on experience with Master Data Management (MDM) platforms such as Informatica MDM, Talend Data Catalog, Semarchy xDM, IBM PIM & IKC, or Profisee.
Familiarity with MDM processes (golden record creation, survivorship, reconciliation, enrichment, quality).
Experience in data governance, including data profiling, remediation, and automated lineage.
Knowledge of stream-processing systems (e.g., Spark Streaming).
Proficiency with cloud services (Azure, GCP, AWS) and platforms like Delta Lake, Databricks.
Advanced experience with relational and NoSQL databases (Hive, HBase, Postgres).
Strong SQL optimization skills.
Proficiency in programming/scripting languages such as Python, Java, Scala.
Proven ability to manipulate, process, and extract value from large and disconnected datasets.
Familiarity with modern development practices (Scrum, TDD, CI/CD, code reviews).
Strong teamwork skills, with proven success in cross-functional collaboration.