THE ROLE:
This position is crucial to AMD's Data Center Product Road Map and will be passionate about providing critical diagnostics for the next generation of AMD's flagship Data Center products. AMD's environment is fast paced, results oriented and built upon a legion of forward-thinking people with a passion for winning technology! We have competitive benefit packages and an award-winning culture. Join us!
THE PERSON:
A successful candidate will have solid software methodologies, software design expertise, deep HW/SW technical knowledge, and leadership skills. The candidate will interact across multiple teams to ensure on time quality system software, push and accelerate AMD's time to market for Data Center GPU products. A candidate must possess technical proficiency and interpersonal confidence to represent design ideas to complex problems and innovative solutions with other developers as well as with non-software focused silicon teams and manufacturing teams. Experience in a technical leadership role is preferred.
KEY RESPONSIBILITIES:
- Serve as the SoC Diagnostics Technical Lead for DCGPU programs, providing primary local ownership and global technical leadership for silicon and manufacturing quality issues across Singapore/Tai and other sites, with end‑to‑end accountability for the quality, coverage, and completeness of diagnostics solutions.
- Work closely with the Diagnostics PM to define and drive end‑to‑end diagnostics strategy by translating program and customer requirements into clear priorities and execution plans across pre‑silicon and post‑silicon phases. Proactively articulate diagnostics objectives, strategic direction, risks, and tooling/framework requirements to PMs, managers, IP and framework architects to influence test coverage strategy, planning, and cross‑team alignment.
- Own the diagnostics pre‑silicon emulation strategy and planning across software‑based and FPGA‑based emulation models, including RTL coverage requirements before silicon tape‑out and diagnostics verification requirements before silicon back.
- Own the SoC system‑level feature validation methodology and planning for diagnostics.
- Drive the technical requirements needed to achieve feature coverage and hardware bug capture targets, ensuring that diagnostics content supports both engineering debug and manufacturing/field health checks.
- Lead and coordinate complex SoC/system‑level investigations (e.g., SLT/Board Production failures, field issues), analyze logs and symptoms, form hypotheses, and work with IP, platform, firmware and software teams to converge on root cause and corrective action.
- Exercise horizontal leadership and collaboration with cross‑functional teams such as platform validation, ROCm/SW, HW architects, product engineering, manufacturing, and other stakeholders to achieve key program milestones (bring‑up, feature enablement, performance profiling, production support) with the desired coverage metrics from diagnostics.
- Collaborate with the Product Engineering Organization to enable the product with high quality to customers debug defects and help improve yield, coverage, and test time during NPI and volume production.
- Provide diagnostics support to contract manufacturers and board engineering teams, particularly for SLT/BP and system‑level test flows and ensure that Diagnostics content is usable and effective in manufacturing environments.
PREFERRED EXPERIENCE:
- Proven experience with IP and SoC validation, diagnostics, and system Bring-up, with the ability to closely interact with hardware designers, validation, manufacturing and software teams.
- Excellent understanding of SoC architecture, including processor, GPU compute, system IO and memory/HBM, and security blocks, to identify critical areas for SoC & IP verification and diagnostics focus.
- Strong system‑level debugging and testing skills, with the capability to quickly identify problems, perform structured root‑cause analysis, and provide robust solutions.
- Excellent communication and interpersonal skills, with the ability to collaborate effectively across global teams and can clearly explain complex technical issues to both technical and non‑technical stakeholders.
- Demonstrated ability to work under pressure and manage competing priorities in tight project timelines while maintaining professionalism and quality.
- Knowledge and experience in developing or enabling applications on industry compute platforms such as ROCm, OpenCL, or CUDA is an asset.
- Familiar with Linux, knowledge and experience of device driver or software development is preferred.
- Knowledge and experience with Manufacturing ATE/Wafer Sort Test and System Level Test a bonus.
- Experienced with source controls systems like Perforce and GIT.
- Hands‑on experience with SoC Bring-up and working in lab environments is a plus.
- Prior experience in software development (e.g., object‑oriented C++, modern C++, system software or drivers), software development lifecycle able to read and review code, understand architecture, and guide engineers in debug. Experience developing machine learning, HPC or general‑purpose GPU compute applications is a bonus.
ACADEMIC CREDENTIALS:
- Bachelor/Master in Computer Science, Computer Engineering or Electrical Engineering.