Search by job, company or skills

X

TTS Direction Algorithm Engineer

2-5 Years
SGD 9,000 - 15,000 per month
new job description bg glownew job description bg glownew job description bg svg
  • Posted 4 months ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities:

  • Design and implement algorithms that direct or steer TTS output: e.g., controlling prosody, style, voice persona, emotion, pacing, emphasis, accent, dialect, intonation.

  • Collaborate with researchers and engineers to take breakthroughs in speech synthesis and apply them to production-scale TTS systems.

  • Work on components such as text normalization, phonetic/linguistic feature extraction, alignment modeling (text acoustic), prosody modeling, vocoder architectures or waveform generation.

  • Build evaluation frameworks and metrics for naturalness, intelligibility, expressiveness, latency, voice persona fidelity, bias/fairness across languages and dialects.

  • Create data pipelines and tooling for voice collection/labeling, human preference judgments, A/B testing for voice direction outputs.

  • Optimize system latency, throughput, memory/compute requirements, streaming support, real-time constraints for voice in conversation.

  • Ensure safe, inclusive, responsible voice output: avoid inappropriate style shifts, voice likeness issues, unintended biases or mis-interpretations. Collaborate with Safety, Policy, Product teams.

  • Integrate the directed-TTS algorithms into product platforms (ChatGPT voice, developer API, accessibility features) and work with product/infra teams to ensure scalability and reliability.

Qualifications:

Required:

  • MS or PhD in Computer Science, Electrical Engineering, Speech/Audio Signal Processing, Machine Learning, or equivalent experience.

  • Proven experience in designing and shipping production-grade TTS or speech-generation systems: e.g., text-to-speech, voice conversion, expressive prosody modeling. (E.g., the Voice AI role at OpenAI requires building and shipping production voice or speech ML systems (TTS, voice cloning, or generative audio).)

  • Deep understanding of the speech synthesis pipeline: text normalization, linguistic/phonetic features, acoustic modeling, vocoder/waveform generation, prosody modeling. (Again, from the voice role: Deep understanding of speech synthesis pipelines: text normalization, linguistic/phonetic features, acoustics, vocoding, and prosody modeling.)

  • Strong ML engineering skills: Python + PyTorch (or another major framework), experience with data pipelines, model training/evaluation/serving, measurement of MOS/intelligibility/latency.

  • Experience with audio tooling, data augmentation for speech, and evaluation metrics for naturalness/latency/persona fidelity.

  • Excellent collaboration & communication skills ability to work cross-functionally with research, product, design, infrastructure, safety/policy teams.

Preferred:

  • Experience with multilingual or dialectal voice systems, low-latency streaming TTS, expressive or adaptive voice personas.

  • Prior publication or contribution in speech synthesis, voice ML research (e.g., at top conferences, patents).

  • Experience with large-scale deployment of voice systems, optimizing latency/throughput in production (e.g., real-time voice API processing).

  • Familiarity with voice likeness / speaker identity protection / consent/licensing in voice systems.

  • Experience with prosody control, emotion modeling, or voice style transfer in TTS.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 132927295

Similar Jobs