The Text-to-Speech endpoint uses Kokoro-82M (speaches-ai/Kokoro-82M-v1.0-ONNX), a compact multilingual TTS model. It supports 54 voices across 9 languages.
Naming Convention
Voice IDs follow the pattern <language><gender>_<name>:
Prefix
Language
Gender
af_
American English
Female
am_
American English
Male
bf_
British English
Female
bm_
British English
Male
jf_
Japanese
Female
jm_
Japanese
Male
zf_
Mandarin Chinese
Female
zm_
Mandarin Chinese
Male
ef_
Spanish
Female
em_
Spanish
Male
ff_
French
Female
hf_
Hindi
Female
hm_
Hindi
Male
if_
Italian
Female
im_
Italian
Male
pf_
Brazilian Portuguese
Female
pm_
Brazilian Portuguese
Male
American English
Female (11)
Voice ID
af_heart
af_alloy
af_aoede
af_bella
af_jessica
af_kore
af_nicole
af_nova
af_river
af_sarah
af_sky
Male (9)
Voice ID
am_adam
am_echo
am_eric
am_fenrir
am_liam
am_michael
am_onyx
am_puck
am_santa
British English
Female (4)
Voice ID
bf_alice
bf_emma
bf_isabella
bf_lily
Male (4)
Voice ID
bm_daniel
bm_fable
bm_george
bm_lewis
Japanese
Female (4)
Voice ID
jf_alpha
jf_gongitsune
jf_nezumi
jf_tebukuro
Male (1)
Voice ID
jm_kumo
Mandarin Chinese
Female (4)
Voice ID
zf_xiaobei
zf_xiaoni
zf_xiaoxiao
zf_xiaoyi
Male (4)
Voice ID
zm_yunjian
zm_yunxi
zm_yunxia
zm_yunyang
Spanish
Female (1)
Voice ID
ef_dora
Male (2)
Voice ID
em_alex
em_santa
French
Female (1)
Voice ID
ff_siwis
Hindi
Female (2)
Voice ID
hf_alpha
hf_beta
Male (2)
Voice ID
hm_omega
hm_psi
Italian
Female (1)
Voice ID
if_sara
Male (1)
Voice ID
im_nicola
Brazilian Portuguese
Female (1)
Voice ID
pf_dora
Male (2)
Voice ID
pm_alex
pm_santa
Usage
Pass any voice ID as the voice parameter in a Text-to-Speech request: