
Search by job, company or skills
Responsibilities
. Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance
. Implement comprehensive logging, alerting, and monitoring systems using Application monitoring tools
. Perform regular health checks performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively
. Manage incident response procedures for pipeline failures, including root cause analysis, resolution, and post-incident reviews
. Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment
. Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency
. Maintain comprehensive documentation for operational procedures, runbooks, and troubleshooting guides
. Coordinate scheduled maintenance windows and system upgrades with minimal business impact
. Manage user access controls, workspace configurations, and security policies within Application environments
Requirements
. Degree in Computer Science or Computer Engineering
. Minimum 5 years working experience in system operations compliance and management areas
. Project hands-on experience specifically with AWS platform (primary requirement), cloud operations or cloud architecture
. Must be cloud certified (AWS)
. Proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration
. Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs
. Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques
. Strong knowledge of monitoring, incident management, and cloud cost control
. Technology Stack Experience:
. Databricks
. AWS cloud services and architecture
. IDMC (Informatica Data Management Cloud)
. Tableau for data visualization
. Oracle Database management
. ML Ops practices within Databricks environment
. STATA for statistical analysis is advantage
. Amazon SageMaker integration with Databricks
. DataRobot platform integration
. Good interpersonal skills with the ability to work with different groups of stakeholders
. Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision
. Excellent communication skills for technical documentation and cross-team collaboration
Licence no: 12C6060
Job ID: 147358993
We don’t charge any money for job offers