Search by job, company or skills
:
- Job Responsibilities
1. Ensure the stability, reliability, and efficient operation of the Xiaomi's global business, maintaining high availability of services at all times.
2. Responsible for core operational tasks such as resource provisioning and management, incident response, capacity management, monitoring, and reliability improvements.
3. Review technical architecture design, assess soundness of the design, and proactively identify and resolve reliability risks.
4. Conduct in-depth analysis of systemic deficiencies, identify bottlenecks and develop optimization strategies; plan and execute projects to improve system reliability and ensure cost-effectiveness and highly availability of the systems.
5. Participate in 24/7 on-call rotation, promptly respond to and resolve production incidents to ensure service availability.
6. Analyze and improve processes to build stable, highly available systems; drive continuous automation improvements, and minimize manual intervention.
:
- Job Requirements
1. Bachelor's degree in Computer Science or a related field.
2. Proficiency in one of the following programming languages: Python, Go, or shell scripting, with demonstrated ability to independently develop modules or platforms.
3. Familiar with cloud computing; experience in managing multi-cloud or hybrid cloud platforms (e.g., Alibaba Cloud, Azure, AWS) is preferred.
4. Strong foundation in computer science, with hands-on experience in Linux, networking, load balancing, and designing high-availability and disaster recovery architectures.
5. A good team player with a strong sense of responsibility, self-driven and highly motivated.
6. Minimum 3 years of working experience in operations and maintenance of large-scale web services is preferred; hands-on experience in managing or operating large-scale web services or projects is a plus.
7. Fluent in Mandarin (spoken) is a plus.
Date Posted: 19/09/2025
Job ID: 126549209