Search by job, company or skills

X

Site Reliability Engineer

3-5 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities:

1Lead the development of observability systems, building foundational capabilities across the full technology stack based on the four pillars: Metrics, Logging, Tracing, and Profiling.

2Design and architect observability-related platforms and services, including monitoring systems, distributed tracing, log services, computing engines (for stream processing, real-time alerting, time-series detection), alerting mechanisms, and eBPF-based solutions.

3Ensure high performance and availability of core observability services in high-concurrency environments, while continuously driving technology and product optimization.

4Implement cutting-edge observability solutions for AI infrastructure and AI applications (including Observability AI+), enhancing AI system stability and improving the user experience and efficiency of traditional observability products.

Qualifications:

1Bachelor's degree or higher in Computer Science or related field, with 3+ years of relevant work experience.

2Proficiency in Java or Go, with expertise in concurrent programming, distributed systems, and performance optimization. Solid programming skills and strong system design capabilities.

Familiarity with mainstream cloud-native observability products and components, including but not limited to: OpenTelemetry, Prometheus, VictoriaMetrics, ELK/EFK, SkyWalking, ClickHouse, eBPF, with practical knowledge of Kubernetes fundamentals and applications.

3Strong understanding of foundational open-source components including Linux OS, networking, storage, and message queues. Deep knowledge of implementation principles is preferred.

4Experience with AI-related technology stacks such as PyTorch, LLaMA-Factory, Spring AI, Langfuse, wandb (Weights & Biases) is advantageous. Practical experience in AI observability scenarios is highly valued.

5Excellent problem identification and resolution skills, with strong analytical and summarization abilities. Outstanding cross-team collaboration, strong sense of responsibility, and ability to work under pressure in a fast-paced international environment. Curiosity and passion for exploring new technologies.

6Ability to communicate effectively in English (both written and spoken) and collaborate seamlessly with global teams.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 144468881

Similar Jobs