Site Reliability Engineer, Traffic Platform

Byte Dance

Singapore

3-5 Years

Save

Posted 5 months ago
Be among the first 10 applicants

Early Applicant

Job Description

Responsibilities

About the Team The team builds and operates large-scale, massively distributed infrastructures, applying Site Reliability Engineering (SRE) principles of software and systems engineering to ensure our traffic services are reliable, fault-tolerant, efficiently scalable, and cost-effective. You will have the opportunity to manage a variety of complex systems at scale, including traffic systems serving hyperscale datacenters and public cloud environments, and a global load balancer that handles Tbps of traffic. We build and operate multi-cloud-based, large-scale network services around the world to accelerate and optimize network traffic for TikTok and a variety of application services for ByteDance internal customers. These services include, but are not limited to, Layer 4 load balancing, Layer 4/7 acceleration, global ingress, CMAF, FaaS, and WAF. By joining us, you can work within a brilliant team and learn how to build a TikTok-scale network traffic platform serving billions of users globally. Responsibilities - Build, expand and operate ByteDance's global traffic platform, including large-scale systems in public and private clouds, edge data centers. - Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global traffic platform. - Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues. - Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement.

Qualifications

Minimum Qualifications - Bachelor's or Master's degree in Computer Engineering, Electrical Engineering, Computer Science or related major. - Proven years experience working with Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols. - At least 3 years experience in one or more programming languages such as Go, Python and Shell script. - Familiar with Cloud and CI/CD framework/Tools, such as GIT, Docker, Kubernetes, etc. Preferred Qualifications - Experience in designing, analyzing and building automation and tools for large scale systems - Experience in building solutions with AWS, Google, Azures and other cloud services. - Experience in networking technologies such TCP/IP, HTTP, DNS, etc. in a carrier-grade environment. - Experience in developing and operating one or more of following systems: Kubernetes, Nginx, ipvs, ELK stack, etc. - Self-driven and capable of coping with ambiguity and moving projects from concept to delivery. - Strong in analytical skills and the ability to solve real world problems in a fast moving environment.

More Info

Job Type:

Permanent Job

Industry:

IT /Computers - Software

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

Byte DanceJob Source: jobs.bytedance.com

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.
Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

Job ID: 129100943

Jobs by Skill - IT