About Tencent
Tencent is an Internet-based platform company founded in Shenzhen, China, in 1998. We use technology to enrich the lives of Internet users and assist the digital upgrade of enterprises. Our mission is Value for Users, Tech for Good. We embrace a culture of teamwork & creativity and are driven by our values - Integrity, Proactivity, Collaboration and Creativity.
We are rapidly expanding our international operations and are looking for top talent to propel us forward. Combining the results-oriented nature of a start-up with the resources of a profitable and leading Internet company, Tencent offers a unique opportunity for aspiring individuals to thrive.
About WeChat
With over 1.4 billion users worldwide, WeChat is changing the mobile landscape by connecting people, services, and businesses in China and around-the world. The WeChat team in Singapore is responsible for managing and growing our core product including messaging and social networking for users worldwide (excluding the Chinese mainland).
Join the WeChat team and play an impactful role in keeping people around the world connected, help redefine how people use their mobile devices to communicate and interact online, and understand user behavior and preferences of users worldwide.
About the Role
WeChat is seeking Cloud Engineers to help deploy and scale WeChat's diverse ecosystem of services to over a billion users. The role will work closely with software engineers, data scientists, security specialists, and project managers to help develop internal tools and security systems for keeping WeChat users worldwide safe. Ensure site reliability by managing the deploy, scaling, and maintanence of new and existing online services to a worldwide network and userbase.
- Participate in system architecture and reliability design for new and existing services, balancing high-availability, service capacity, performance, and cost.
- Deploy, manage and maintain new and existing services; manage Kubernetes/container clusters (upgrade, scaling, multi-cluster governance).
- Build and operate observability: metrics/logs/tracing, alert strategy (noise reduction, severity, escalation) and incident handling and resolution.
- Own the CI/CD and release engineering pipeline: pipeline design, canary/gray release, rollback, configuration and change management.
- Drive automation to eliminate repetitive work (scripts/tools, IaC), and improve operational efficiency and quality.
- Handle high-severity incidents focusing on fast detection, mitigation, recovery, and postmortem-driven improvements.
Who We're Looking For
- Bachelor's degree or above in Computer Science, Information Systems, or related fields
- Prior work experience in Cloud Engineering, Site Reliability Engineering (SRE), or DevOps for a major, public-facing internet service
- Strong Linux fundamentals (kernel/memory/process/thread/IPC) and solid networking knowledge (HTTP/DNS/TLS/TCP/IP)
- Cluster management experience with containers and Kubernetes; familiarity with upgrade, troubleshooting and governance
- Experience in large-scale distributed systems and microservices; ability to reason about trade-offs (CAP, consistency, latency)
- Experience with monitoring/alerting and observability tools (e.g., Prometheus, Zabbix) and practical alert hygiene
- Experience with CI/CD, release automation, and change management; familiarity with IaC is a plus
- Hands-on experience with at least one language: Go, Python, Bash; able to write production-quality automation code
- Database operations experience (MySQL/PostgreSQL/Redis) is a plus
- Fluency in both English and Mandarin Chinese to deal with international stakeholders and stakeholders who are based in HQ