Voices

The Text-to-Speech endpoint uses Kokoro-82M (speaches-ai/Kokoro-82M-v1.0-ONNX), a compact multilingual TTS model. It supports 54 voices across 9 languages.

Naming Convention

Voice IDs follow the pattern <language><gender>_<name>:

Prefix Language Gender
af_ American English Female
am_ American English Male
bf_ British English Female
bm_ British English Male
jf_ Japanese Female
jm_ Japanese Male
zf_ Mandarin Chinese Female
zm_ Mandarin Chinese Male
ef_ Spanish Female
em_ Spanish Male
ff_ French Female
hf_ Hindi Female
hm_ Hindi Male
if_ Italian Female
im_ Italian Male
pf_ Brazilian Portuguese Female
pm_ Brazilian Portuguese Male

American English

Female (11)

Voice ID
af_heart
af_alloy
af_aoede
af_bella
af_jessica
af_kore
af_nicole
af_nova
af_river
af_sarah
af_sky

Male (9)

Voice ID
am_adam
am_echo
am_eric
am_fenrir
am_liam
am_michael
am_onyx
am_puck
am_santa

British English

Female (4)

Voice ID
bf_alice
bf_emma
bf_isabella
bf_lily

Male (4)

Voice ID
bm_daniel
bm_fable
bm_george
bm_lewis

Japanese

Female (4)

Voice ID
jf_alpha
jf_gongitsune
jf_nezumi
jf_tebukuro

Male (1)

Voice ID
jm_kumo

Mandarin Chinese

Female (4)

Voice ID
zf_xiaobei
zf_xiaoni
zf_xiaoxiao
zf_xiaoyi

Male (4)

Voice ID
zm_yunjian
zm_yunxi
zm_yunxia
zm_yunyang

Spanish

Female (1)

Voice ID
ef_dora

Male (2)

Voice ID
em_alex
em_santa

French

Female (1)

Voice ID
ff_siwis

Hindi

Female (2)

Voice ID
hf_alpha
hf_beta

Male (2)

Voice ID
hm_omega
hm_psi

Italian

Female (1)

Voice ID
if_sara

Male (1)

Voice ID
im_nicola

Brazilian Portuguese

Female (1)

Voice ID
pf_dora

Male (2)

Voice ID
pm_alex
pm_santa

Usage

Pass any voice ID as the voice parameter in a Text-to-Speech request:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/audio/speech' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-82m",
    "input": "Threat intelligence report ready.",
    "voice": "af_nova"
  }' --output report.mp3

Note: Voices perform best with 100–200 tokens. Very short (<10 tokens) or very long (>400 tokens) inputs may produce lower quality output.