Search by job, company or skills

F

Site Reliability Engineer

3-5 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 18 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Role

This position focuses on build nextgeneration observability systems for a globalscale product. You'll design core observability platforms, optimize performance under high concurrency, and help shape the future of AIdriven observability.

Key Responsibilities

  • Develop observability capabilities across metrics, logs, tracing, and profiling.
  • Architect monitoring, log processing, tracing, alerting, and streamprocessing platforms, including eBPFbased solutions.
  • Ensure performance, stability, and availability of core observability services.
  • Build and enhance observability solutions for AI infrastructure and applications.

Qualifications

  • Bachelor's degree in CS or related fields; 3+ years relevant experience.
  • Strong skills in Java or Go; good understanding of concurrency, distributed systems, and performance optimization.
  • Familiar with cloudnative observability tools: OpenTelemetry, Prometheus, VictoriaMetrics, ELK/EFK, SkyWalking, ClickHouse, eBPF, Kubernetes.
  • Solid understanding of Linux, networking, storage, message queues.
  • Experience with AI frameworks (PyTorch, LLaMAFactory, Spring AI, Langfuse, W&B) is a plus.
  • Strong problemsolving, communication, and crossteam collaboration; comfortable in fastpaced global environments.
  • Good English&Mandarin communication skills.

About the Company

A fastgrowing global platform with strong engineering culture, serving millions of users across markets. You'll join the core team building reliability and observability foundations for global expansion.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144198601

Similar Jobs