Responsibilities
Team Introduction Our team is responsible for infrastructure systems of hybrid cloud, including products in IaaS/PaaS/SaaS/AI models. We strive to be a leading Site Reliability Engineering (SRE) team in the industry, driving reliability, scalability, and performance at scale. As part of the SRE team, you will tackle complex, large-scale challenges, leveraging your expertise in coding, algorithms, complexity analysis, and distributed system design. We foster a culture of diversity, intellectual curiosity, and open collaboration. Engineers are empowered with strong ownership, autonomy, and the opportunity to work across a wide range of impactful projects. You will also benefit from a supportive environment with mentorship and resources designed to help you continuously learn and grow. What you will be doing: 1. Responsible for delivery products in hybrid cloud scenarios, including cloud platform planning, software deployment, resource expansion, etc. Collaborate with R&D teams to complete project delivery. 2. Responsible for the operation of cloud platform environments for internal and external customers, including daily alarm handle, on-call support, change, as well as ensuring stability of cloud platform during important event periods. 3. As a SRE we will participate in stability construction of cloud products with R&D team, and continuously improve capabilities in high availability architecture, disaster recovery, alarm monitoring, etc, based on the experience we get from large-scale systems on site. 4. Continuously promote the improvement of hybrid cloud serviceability, participate in the standardized SOW of O&M and delivery for new product versions, and build the SRE serviceability acceptance standards to improve implement efficiency.
Qualifications
Minimum Qualifications: - Bachelor's / Master's Degree in Computer Science or related major, with at least 5 years of relevant experience - Solid basic knowledge of computer software, understanding of Linux operating system, network , middleware and other related principles. - Familiar with one or more programming languages, such as Shell, Python, Go, or Java. Knowledge of building scripts or tools to handle different problems. - Experience in operation and maintenance of one or more fields, including virtual machines, containers, K8s, load balancing, middleware, AI models, etc. Preferred Qualification - Experience in operation and maintenance of IDC equipments such as switches and GPU servers is preferred - Working experience in cloud platform related vendors is preferred.