Nava is building the world's first Silicon-to-Intent autonomous hyperscaler. Following our $22M Series A, we are aggressively scaling our high-density GPU clusters across Mumbai, Hyderabad, and Singapore. We are looking for a world-class Network Architect to design and implement the end-to-end fabric that powers our large-scale NVIDIA H100 and B200 clusters. In this role, you won't just be managing a network—you will be re-engineering how compute and fabric intersect to eliminate bottlenecks in distributed training and inference.
THE ROLE
- Architect & Design: Lead the architectural design of E2E non-blocking networking for multi thousand GPU clusters.
- Fabric Orchestration: Deploy and optimize NVIDIA Quantum-2 InfiniBand (NDR) and NVIDIA Spectrum-4 (Spectrum-X) Ethernet fabrics to support multi-rail, rail-optimized topologies.
- DPU Integration: Architect offloading strategies using BlueField-3 DPUs to handle security, telemetry, and storage acceleration, ensuring zero-trust hardware-native isolation.
- Performance Tuning: Fine-tune NCCL/UCX collectives and congestion control mechanisms (Adaptive Routing, SHARP) to maximize MFU (Model Flops Utilization).
- Infrastructure as Code: Automate the lifecycle of the network fabric in a software-defined, autonomous cloud environment.
Technical Requirements
- NVIDIA Networking Stack: Expert-level experience with NVIDIA Quantum-2 InfiniBand (NDR) switches and NVIDIA Spectrum-4 (Spectrum-X) high-performance Ethernet.
- Deep DPU Knowledge: Hands-on experience with NVIDIA BlueField (DOCA) for network and security offloading.
- Protocol Mastery: Expertise in RDMA / RoCE v2, BGP, EVPN-VXLAN, and sophisticated congestion control algorithms.
- Scale Experience: Proven track record of building and operating CLOS/Leaf-Spine architectures at a scale of 512+ GPUs.
- Security: Understanding of hardware-native security, including line-rate encryption and zero trust micro-segmentation.
Preferred Qualifications
- Experience with liquid-cooled high-density rack networking.
- Contributions to open-source networking projects or OCP (Open Compute Project).
- Familiarity with the financial modeling of TCO for large-scale hardware deployments.
WHY NAVA
We are a lean, elite engineering team moving at terminal velocity. You will have the autonomy to choose the best-in-class gear and the runway ($22M Series A) to build a sovereign, autonomous AI cloud from the ground up
Skills: design,spectrum,cloud,nvidia,networking,security,building,infiniband,infrastructure,ethernet,density