About the Role
This position focuses on build nextgeneration observability systems for a globalscale product. You'll design core observability platforms, optimize performance under high concurrency, and help shape the future of AIdriven observability.
Key Responsibilities
- Develop observability capabilities across metrics, logs, tracing, and profiling.
- Architect monitoring, log processing, tracing, alerting, and streamprocessing platforms, including eBPFbased solutions.
- Ensure performance, stability, and availability of core observability services.
- Build and enhance observability solutions for AI infrastructure and applications.
Qualifications
- Bachelor's degree in CS or related fields; 3+ years relevant experience.
- Strong skills in Java or Go; good understanding of concurrency, distributed systems, and performance optimization.
- Familiar with cloudnative observability tools: OpenTelemetry, Prometheus, VictoriaMetrics, ELK/EFK, SkyWalking, ClickHouse, eBPF, Kubernetes.
- Solid understanding of Linux, networking, storage, message queues.
- Experience with AI frameworks (PyTorch, LLaMAFactory, Spring AI, Langfuse, W&B) is a plus.
- Strong problemsolving, communication, and crossteam collaboration; comfortable in fastpaced global environments.
- Good English&Mandarin communication skills.
About the Company
A fastgrowing global platform with strong engineering culture, serving millions of users across markets. You'll join the core team building reliability and observability foundations for global expansion.