We are seeking an experienced PBS Engineer to design, operate, and optimize PBS-based workload scheduling environments supporting High-Performance Computing (HPC) and AI workloads.
Responsibilities :
PBS Scheduler Design & Administration
Install, configure, and administer PBS Professional / Altair PBS
Design and manage:
Queues, partitions, and reservations
Fair-share and priority policies
Project- and user-based accounting
Optimize scheduling policies to balance performance, fairness, and utilization
HPC Workload Optimization
- Tune PBS for large-scale parallel workloads:
- MPI
- OpenMP
- Hybrid workloads
- Support GPU and AI workloads:
- GPU resource definitions and placement
- Multi-node, multi-GPU scheduling
- Analyze workload patterns to improve throughput and time-to-results
Operations & Production Support
- Provide Day-2 operational support for PBS services
- Troubleshoot job failures, scheduling delays, and resource contention
- Act as escalation point for scheduler-related issues
- Lead incident resolution for PBS-related outages
- Plan and execute PBS upgrades, patches, and configuration changes
User Enablement & Advisory
- Support users with:
- Job submission scripts
- Resource requests
- Queue selection
- Scheduler-related debugging
- Provide training and documentation on PBS best practices
- Advise users on optimal job configurations for performance and efficiency
Integration & Automation
- Integrate PBS with:
- Cluster management tools
- Monitoring systems
- Accounting and reporting platforms
- Identity and access management systems
- Automate scheduler operations using Python and Bash
- Develop scheduler metrics, reporting, and capacity planning inputs
Requirements :
- 4-6+ years of experience in HPC environments
- Strong focus on workload scheduling
- Hands-on experience with PBS Pro / Altair PBS
- Strong knowledge of: queues, reservations, fair-share policies & accounting
- Experience supporting production HPC clusters
- Strong analytical and troubleshooting skills
- Ability to translate user requirements into scheduler policies
- Proactive, detail-oriented, and collaborative mindset
Preferred Certifications :
- Altair PBS Professional certification
- Linux certifications (RHCSA / RHCE)
- HPC or vendor-specific certifications