Hybrid Cloud Operation and Delivery Engineer (SRE) - Data Infrastructure

Byte Dance

Singapore

5-7 Years

Save

Posted 12 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Responsibilities

Team Introduction Our team is responsible for infrastructure systems of hybrid cloud, including products in IaaS/PaaS/SaaS/AI models. We strive to be a leading Site Reliability Engineering (SRE) team in the industry, driving reliability, scalability, and performance at scale. As part of the SRE team, you will tackle complex, large-scale challenges, leveraging your expertise in coding, algorithms, complexity analysis, and distributed system design. We foster a culture of diversity, intellectual curiosity, and open collaboration. Engineers are empowered with strong ownership, autonomy, and the opportunity to work across a wide range of impactful projects. You will also benefit from a supportive environment with mentorship and resources designed to help you continuously learn and grow. What you will be doing: 1. Responsible for delivery products in hybrid cloud scenarios, including cloud platform planning, software deployment, resource expansion, etc. Collaborate with R&D teams to complete project delivery. 2. Responsible for the operation of cloud platform environments for internal and external customers, including daily alarm handle, on-call support, change, as well as ensuring stability of cloud platform during important event periods. 3. As a SRE we will participate in stability construction of cloud products with R&D team, and continuously improve capabilities in high availability architecture, disaster recovery, alarm monitoring, etc, based on the experience we get from large-scale systems on site. 4. Continuously promote the improvement of hybrid cloud serviceability, participate in the standardized SOW of O&M and delivery for new product versions, and build the SRE serviceability acceptance standards to improve implement efficiency.

Qualifications

Minimum Qualifications: - Bachelor's / Master's Degree in Computer Science or related major, with at least 5 years of relevant experience - Solid basic knowledge of computer software, understanding of Linux operating system, network , middleware and other related principles. - Familiar with one or more programming languages, such as Shell, Python, Go, or Java. Knowledge of building scripts or tools to handle different problems. - Experience in operation and maintenance of one or more fields, including virtual machines, containers, K8s, load balancing, middleware, AI models, etc. Preferred Qualification - Experience in operation and maintenance of IDC equipments such as switches and GPU servers is preferred - Working experience in cloud platform related vendors is preferred.

More Info

Job Type:

Permanent Job

Industry:

IT /Computers - Software

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

Byte DanceJob Source: jobs.bytedance.com

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

Job ID: 146027265

Jobs by Skill - IT