Solution & Technical Manager
Key responsibilities include:
- Overall Architecture Design & Technology Selection: Clearly understand customer requirements and collaborate with stakeholders to finalize the architecture of GPU Infra platform, including IDC infrastructure, hardware topology, network architecture, and technology stack selection.
- Technical Leadership & Governance: Provide overall technical leadership and management for the platform Design and Delivery. Communicate effectively with the CTO to jointly make decisions regarding the holistic solution and critical issues.
- Critical Technical Problem Solving: Lead the resolution of complex technical challenges. Coordinate the client, NVIDIA and other 3rd partners to accurately locate and resolve technical issues throughout the implementation lifecycle.
- Project Execution & Technical Management: Oversee technical aspects of project implementation, including site survey, hardware rack installation, cabling, hardware testing, network testing, stress testing, and performance tuning to ensure delivery quality.
- Knowledge Management: Drive knowledge accumulation, conduct post-project reviews (retrospectives), and manage training and knowledge transfer
Who you are
- Education: Bachelor's degree or higher in Computer Science, Electronic Engineering, Automation, or a related field.
- Experience: 8+ years of experience in cloud platform, AI infrastructure, or large-scale data center construction. Rich experience in building enterprise-grade cloud & AI platforms.
- Expertise: Extensive hands-on experience in project execution, platform architecture design, and technical management.
- Technical Capacity: In-depth understanding of GPU clusters, networking, storage, and security. Specific areas of expertise include:
- NVIDIA GPU Architecture: Deep technical knowledge, particularly regarding the Blackwell architecture and NVLink technology.
- HPC Networking: Expertise in InfiniBand and high-performance computing network technologies.
- Distributed Systems: Solid understanding of distributed systems principles.
- Familiarity with mainstream hardware (servers/network equipment) and operating systems (Linux), with a deep understanding of virtualization and containerization technologies.
- Infrastructure Knowledge: Strong understanding of data center infrastructure and the specific requirements (power, cooling, layout) for large-scale GPU clusters.
- Problem Solving: Demonstrated ability to diagnose and resolve complex technical challenges.
- Leadership: High resilience under pressure and strong leadership skills, capable of guiding a technical team to deliver projects and build platforms end-to-end.
Regret to inform that only shortlisted candidates will be notified.
www.dadaconsultants.com
EA Registration Number: R1878287
Business Registration Number: 201735941W.