Senior Infra Reliability Engineer

amazon asia-pacific resources private limited

Singapore, Robinson

8-10 Years

SGD 12,000 - 18,000 per month

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

As a Senior Infrastructure Reliability Engineer you will be proactively driving the reliability risk identification, assessment and mitigation for datacenter infrastructure equipment (Example: Air Handling Units, LV Generator, MV Transformers, LV SWGR, Breakers, UPS, Chillers etc.). You will also be responsible for root cause analysis of critical equipment failures and drive the continuous improvements to improve datacenter availability for AWS customers. You will work closely with both internal and outside partners including suppliers to drive key aspects of product specification, risk identification plan and execution. You must be ownership minded, independent, action and results oriented to succeed in an open collaborative environment.

Our Senior Reliability Engineers have experience in using Physics-of-Failure based approach to develop and implement both analytical and empirical approaches for product quality/reliability risk identification and assessment during product design, manufacture as well as deployment stages. They drive AWS application-specific requirements in carrying out both lifecycle environmental and operational stress driven risk analysis, including thermal, electrical, chemical and mechanical stresses so to identify overstress and fatigue-related product weaknesses. Evaluate product design quality/reliability risks and assess electronics manufacture process related quality/reliability issues.

They drive critical component identification and the associated vendor selection and qualification requirements. Using their knowledge of process capability for electronic component production as well as system-level performance requirements to establish critical to quality and reliability metrics, they develop datacenter system level reliability model and related reliability quantification and risk analysis for datacenter configuration optimization.

During sustaining stage, you will be responsible for monitoring product performance in the field and will be responsible to drive root cause analysis of any critical failures and the associated corrective and preventive actions. You will drive effective vendor auditing and quarterly review process to drive the continuous improvements of datacenter availability.

As an SME in the reliability engineering field and product reliability leadership, as well as business negotiations and program management, you will conduct problem analysis and solve as well as communicate with vendors.

In this role, you will be required to travel within APAC and internationally.

Basic qualifications

Bachelor's degree in Electrical or Mechanical Engineering, Engineering Technology, Reliability Engineering, or 8+ years of managing, analyzing and communicating results to senior leadership experience
5+ years of root cause analysis and troubleshooting or problem solving experience
5+ years of product validation (Shock/Drop, cycle testing, environmental testing) experience

Preferred qualifications

Bachelor's degree in Electrical or Mechanical Engineering, Engineering Technology, Reliability Engineering, or 10+ years of managing, analyzing and communicating results to senior leadership experience
Experience in supply chain, commodity, and supplier management in a high volume, global sourcing and operations manufacturing environment with a global supply base of contract manufacturers
Knowledge of critical data center mechanical and electrical equipment
Experience managing multiple projects, prioritizing, planning, and managing time