Responsible for the operation and maintenance of cloud business, ensuring stable and reliable platform services, and identifying and resolving performance bottlenecks.
Follow internal company procedures. Responsible for incident management, service request management, issue management, and operation and maintenance change management. Responsible for platform software updates and deployments, and the construction and maintenance of core systems.
Handle sudden major and minor failures, and restore services. Analyze the root causes of incidents and improve and optimize them.
Develop and maintain automated operation and maintenance tools to improve operation and maintenance efficiency and optimize operation and maintenance processes.
Provide 7*24 OnCall technical support service and 5*8 working hour service.
Requirements:
Bachelor's degree or above, with at least two years of relevant work experience.
Computer science or related field, proficient in Linux system administration, and skilled in using common administration tools such as containers, Kubernetes, and Ansible. Prior experience with cloud providers like Azure and AWS is preferred.
Proficient in at least one scripting language: Python/Shell, etc. Experience with Go/C/C++ is preferred.
Good documentation habits; able to write and update workflows and technical documents promptly as required.
Excellent service awareness, communication skills, learning ability, and initiative.