
Search by job, company or skills
Responsibilities:
. Develop robust data ingestion, transformation, and loading processes across batch and near-real-time workflows.
. Implement distributed processing with Apache Spark (PySpark/Scala) for large-scale data transformations and analytics.
. Create and maintain logical and physical data models (dimensional/star schemas, data vault, or wide tables) optimized for analytics and reporting.
. Write optimized SQL and HiveQL queries manage tables, partitions, and storage formats.
. Schedule and monitor pipelines using Control M ensuring SLA adherence and timely delivery.
. Tune Spark jobs, SQL/Hive queries, and storage strategies for scalability and cost efficiency.
. Implement validation, reconciliation, and lineage using checks, unit tests, and metadata frameworks.
. Build operational dashboards and alerts diagnose failures drive root-cause analysis and remediation.
. Maintain clear runbooks, architecture diagrams, data dictionaries, and coding standards.
. Apply best practices for data privacy, access control as applicable. Execute continuous service improvement and process improvement plans.
. Prepare Unit test cases and work closely with Testing team during SIT and UAT.
. Build package and migrating the code drop through environments (e.g., DEV QA PROD), with audit trails and workflow governance
Education:
. Bachelor's degree/University degree in Computer Science, Engineering, or equivalent experience
Essential:
. Adopt an uncompromising attitude when it comes to quality and help raise bar of products and team members
. Be a team player who communicates effectively and professionally with both internal and external customers
. Identify ideas to improve system performance and impact availability
. Embrace tackling and resolving complex technical design issues
. Possess strong problem solving and decision-making skills while exercising good judgment
. Strong analytical and problem-solving skills
. Ability to work on multiple projects at a time
. Be able to work under pressure and manage deadlines or unexpected changes in expectations or requirements
. Good communication skills - ability to convey technical information to non-technical audience
. Ability to understand the big picture
. Ability to develop long lasting relationships with all levels
. Deep understanding and experience in software development cycle, including Agile based rapid delivery
. Collaborate with business and IT to analyse, elicit and review business requirements
. Facilitate communication between vendor, project team, business stakeholders and internal IT team
. Ability to work in a team distributed across multiple locations
Key Domain/ Technical Skills:
. Programming Languages: Proficiency in Python for data processing, automation, and unix shell scripting.
. Big Data Frameworks: Hands-on experience with Apache Spark for distributed data processing and analytics.
. Database & Querying: Strong knowledge of SQL for relational databases and Hive for querying large datasets in Hadoop ecosystems.
. ETL Development: Expertise in designing and implementing ETL pipelines for data ingestion, transformation, and loading.
. Workflow Orchestration: Familiarity with Control-M or similar scheduling tools for batch job automation and monitoring.
. Data Warehousing: Understanding of data modeling and optimization techniques for large-scale data storage and retrieval.
. Performance Tuning: Ability to optimize queries and jobs for efficiency and scalability.
. Version Control & CI/CD: Experience with Git and deployment pipelines for data engineering workflows.
. BI/Analytics Integration: Familiarity with how downstream tools (Power BI/Tableau) consume curated datasets.
. Security: IAM, secrets management, encryption at rest/in transit, PII handling.
. Languages: Python (PySpark), SQL, Unix Shell Scripting
. Frameworks: Spark, Hive, Sqoop
. Orchestration: Control M
. Storage & Files: HDFS, Parquet, ORC
. Version Control & CI/CD: Git, GitHub/GitLab
. Release and Deployment: Aldon, Jenkins
. Issue Tracking: Jira
. Documentation: Confluence/Wiki
. Optional: QLK sense, Tableau or any reporting dashboard
Job ID: 138851303