Client:
A leading provider of advanced intelligence solutions, empowering R&D and intellectual asset teams to drive innovation.
Key Responsibilities
- Maintain high availability, stability, and optimal performance of business platforms by developing optimization strategies and refining operational standards and procedures.
- Lead the design and deployment of scalable, maintainable DevOps architectures and automation tools to improve operational efficiency.
- Conduct security risk assessments and spearhead the development and execution of security strategies to ensure system safety.
- Assess and review system architecture, process workflows, performance, and stability, collaborating closely with SRE and development teams to resolve challenges effectively.
- Serve as the primary incident commander for production issues, guiding the team through troubleshooting and resolution processes to ensure prompt response and recovery.
Desired Qualifications
- Bachelors degree in Computer Science or related field, with a minimum of 5 years&apos experience in internet system operations or SRE roles.
- Strong understanding of internet technology architecture, including microservices, Kubernetes, Docker, monitoring and alerting tools, CI/CD pipelines, logging frameworks, distributed caching, and databases.
- Proven experience with distributed systems and high-concurrency environments, with excellent fault diagnosis and system optimization skills.
- Hands-on experience with cloud platforms such as AWS or Azure, and knowledge of MySQL, PostgreSQL, Redis, along with familiarity with big data technologies and hybrid cloud setups is a plus.
- Proficiency in at least one programming language (e.g., Python, Go, Java) with relevant development background.
- Mandarin is required due to the need to communicate with Mandarin speaking clients and counterparts