Search by job, company or skills
Responsibilities
1. Participated in the development and maintenance of market data ingestion and order matching systems with nanosecond-level latency requirements, ensuring stability and performance under extreme load conditions
2. Engaged in performance modeling and analysis across various modules of the trading system (e.g., data path, memory access patterns, cache hit rate, context switch overhead), and drove system metrics visualization
3. Participated in the development and tuning of a custom network protocol stack based on DPDK, leveraging kernel bypass technologies
4. Contributed to the design and optimization of low-latency communication frameworks, including:
. Network protocol stacks (TCP/UDP, WebSocket, HTTP/2)
. Inter-process/thread communication (shared memory, lock-free ring buffers)
. User-space timestamp precision and synchronization mechanisms
5. Monitored performance hotspots in the Linux kernel and studied their impact on low-latency trading (e.g., in the scheduler, network stack, and I/O subsystems), followed by targeted optimizations based on CPU microarchitecture characteristics.
Requirements
Fundamental & General Skills
. Proficient in modern C++ (C++17/20) with strong system-level high-performance programming skills (e.g., cache-friendly data structure design, zero-copy data pipelines, CPU affinity binding)
. Solid understanding of CPU architecture (x86_64), cache hierarchy, branch prediction, pipeline stalls, and other hardware-level behaviors
. In-depth experience in at least one of the following areas:
. Analysis and optimization of core Linux kernel components such as the network stack, scheduler, and memory subsystem
. CPU microarchitecture analysis using tools like perf, Intel VTune, or the Top-Down analysis method
. SIMD vectorization (e.g., AVX2/AVX512) or custom assembly-level optimization
. Development of system-level latency profiling or microbenchmark toolchains
Networking & Protocol Stack Development
. Deep understanding of the WebSocket protocol with the ability to implement custom frame parsing and construction
. Familiarity with I/O multiplexing mechanisms like epoll and io_uring
. Experience independently building high-performance WebSocket or FIX/SBE protocol clients
. Experience with at least one user-space network stack technology such as netmap or DPDK
High-Frequency Trading & Data Processing
. Familiar with tick data processing workflows in high-frequency trading environments, including market data reconstruction and reordering mechanisms
. Knowledge of PTP/TSC synchronization and system clock stability tuning
. Candidates with knowledge of the network topologies and direct connectivity latency characteristics of major global exchanges will be preferred.
Bonus Points
. Hands-on experience in end-to-end trading system latency optimization (from drivers, interrupts, protocol framing to user-space scheduling)
. Proficiency with performance analysis tools such as perf, ftrace, bpftrace, VTune, and cachegrind
. Experience designing and tuning in-house user-space lock-free message queues or shared memory pools.
Date Posted: 03/09/2025
Job ID: 125407187