
Search by job, company or skills
Responsibilities
. Integrate data from multiple sources, such as databases, APIs, or streaming platforms, to provide a unified view of the data
. Implement data quality checks and validation processes to ensure the accuracy, completeness, and consistency of data
. Identify and resolve data quality issues, monitor data pipelines for errors, and implement data governance and data quality frameworks
. Enforce data security and compliance with relevant regulations and industry-specific standards
. Implement data access controls, encryption mechanisms, and monitor data privacy and security risks
. Optimise data processing and query performance by tuning database configurations, implementing indexing strategies, and leveraging distributed computing frameworks
. Optimize data structures for efficient querying and develop data dictionaries and metadata repositories
. Identify and resolve performance bottlenecks in data pipelines and systems
. Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders
. Document data pipelines, data schemas, and system configurations, making it easier for others to understand and work with the data infrastructure
. Monitor data pipelines, databases, and data infrastructure for errors, performance issues, and system failures
. Set up monitoring tools, alerts, and logging mechanisms to proactively identify and resolve issues to ensure the availability and reliability of data
. It would be a plus if he has software engineering background
Requirements
. Bachelor's or master's degree in computer science, information technology, data engineering, or a related field
. Strong knowledge of databases, data structures, algorithms
. Proficiency in working with data engineering tools and technologies including knowledge of data integration tools (e.g., Apache Kafka, Azure IoTHub, Azure EventHub), ETL/ELT frameworks (e.g., Apache Spark, Azure Synapse), big data platforms (e.g., Apache Hadoop), and cloud platforms (e.g., Amazon Web Services, Google Cloud Platform, Microsoft Azure)
. Expertise in working with relational databases (e.g., MySQL, PostgreSQL, Azure SQL, Azure Data Explorer) and data warehousing concepts.
. Familiarity with data modeling, schema design, indexing, and optimization techniques is valuable for building efficient and scalable data systems
. Proficiency in languages such as Python, SQL, KQL, Java, and Scala
. Experience with scripting languages like Bash or PowerShell for automation and system administration tasks
. Strong knowledge of data processing frameworks like Apache Spark, Apache Flink, or Apache Beam for efficiently handling large-scale data processing and transformation tasks
. Understanding of data serialization formats (e.g., JSON, Avro, Parquet) and data serialization libraries (e.g., Apache Avro, Apache Parquet) is valuable
. Having experience in CI/CD and GitHub that demonstrates ability to work in a collaborative and iterative development environment
. Having experience in visualization tools (e.g. Power BI, Plotly, Grafana, Redash) is beneficial
Preferred Skills & Characteristics
Consistently display dynamic independent work habits, goal oriented, passionate in growth mindsets and self-motivated professional. Self-driven and proactive in keeping up with new technologies and programming
Job ID: 140584291