
Search by job, company or skills
Role Overview
Python-focused Data Engineer with strong hands-on coding skills in data-intensive systems. The role focuses on building scalable data pipelines,processing large datasets, and enabling AI/Generative AI applications through well-structured data infrastructure.
Key Responsibilities
. Build and maintain scalable data pipelines using Python
. Write production-grade Python code specifically for data processing, transformation, and ETL workflows
. Perform data cleaning, pre processing, and feature preparation for analytics and AI use cases
. Use data analysis and manipulation tools to handle large datasets efficiently
. Develop reusable Python modules for data ingestion and pipeline automation
. Perform exploratory data analysis (EDA)to understand data patterns and quality issues
. Optimize data workflows for performance, scalability, and reliability
. Support data requirements for AI/ML and Generative AI systems
. Build data services and APIs to support downstream AI applications
. Ensure data quality, consistency, and observability across pipelines
Required Python & Data Libraries (Hands-on Experience Mandatory)
Candidates must have strong practical experience with:
. pandas - data manipulation, transformation, and analysis
. NumPy - numerical operations and array-based processing
. Matplotlib - data visualization and reporting
. scikit-learn - basic ML workflows and model evaluation
. Py Torch - deep learning and AI model experimentation
AI /Generative AI Enablement
. Prepare and structure datasets for M Land LLM-based systems
. Support integration of AI models into data pipelines and applications
. Enable workflows for Generative AI use cases (RAG systems, agent workflows)
. Work with multiple AI model providers:
. OpenAI
. Anthropic
. LLaMA
. Mistral
. Exposure to AI orchestration frame works such as Lang Chain, AutoGen, and CrewAI
Core Requirements
. Strong hands-on Python coding expertise focused on data systems (critical requirement)
. Ability to write clean, efficient, production-grade Python code
. Strong understanding of data structures, ETL pipelines, and data workflows
. Experience working with large-scale structured and unstructured data
. Strong SQL skills for data extraction and manipulation
. Understanding of data modeling and analytics workflows
. Ability to support end-to-end data-to-AI pipelines
Preferred /Good to Have
. Experience with big data or distributed processing systems
. Understanding of vector databases and embedding-based retrieval systems
. Experience building APIs or services for data/AI systems
. Familiarity with cloud platforms (AWS Azure, GCP)
. Exposure to production monitoring and data observability tools
What Success Looks Like
. High-quality Python code powering scalable data pipelines
. Reliable, clean, and well-structured datasets for AI systems
. Efficient ETL workflows with minimal manual intervention
. Seamless support for ML and GenAI applications in production
R
Job ID: 148618551
Skills:
Denodo, Power Bi, AWS Glue, Tableau, Gitlab, Restful Apis, Sql, Python, Er Studio, VQL, AWS Sagemaker

Skills:
snowflake , BigQuery, Google Cloud Platform, Pyspark, Apache Spark, Redshift, Sql, Git, Azure Data Factory, Databricks, Azure, Python, AWS, Airflow
Skills:
Power Bi, Databricks, Python, Sql, R, Looker
Skills:
Java, Hadoop, Scala, Pyspark, Tableau, Data Warehousing, Hadoop Hive, Data Management, Data Modelling, Python, Data lakes, HDPS, Data Processing, Data APIs, Spark Suite, ETL processes
Skills:
Java, Ranger, Hadoop, Scala, Pyspark, Kafka, Json, Impala, Sql, Red Hat Linux, Django, Hive, Shell Script, Zookeeper, Spark, Cloudera Manager, Python, orc, Parquet, ATLAS, ETL data pipelines
We don’t charge any money for job offers