Audio

The Audio API provides speech-to-text transcription and text-to-speech synthesis using ThreatWinds-hosted audio models. All endpoints are OpenAI-compatible. On the unified /models API these audio models are reported under the threatwinds provider, while the owned_by field identifies the original model maintainer (e.g. OpenAI for Whisper).

Speech-to-Text (Transcriptions)
Text-to-Speech (Speech)

Speech-to-Text (Transcriptions)

Transcribe audio files into text using the Whisper model.

Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/transcriptions

Method: POST

Parameters

Headers

Header	Type	Required	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication
Content-Type	string	Yes	Must be `multipart/form-data`

Note: You must use either the Authorization header OR the api-key/api-secret combination.

Form Fields

Field	Type	Required	Description
file	file	Yes	Audio file to transcribe. Supported formats: mp3, wav, ogg, flac
model	string	Yes	Model ID to use for transcription
language	string	No	Language of the audio in ISO-639-1 format (e.g. `en`, `es`, `fr`). Auto-detected if omitted
response_format	string	No	Format of the transcription output. `json` (default), `text`, `srt`, `verbose_json`, `vtt`
temperature	float	No	Sampling temperature between 0.0 and 1.0. Lower values produce more deterministic output
prompt	string	No	Optional text to guide the model’s style or provide context

Available Models

Model	Provider	Owned By	Description
whisper-large-v3	threatwinds	OpenAI	High-accuracy multilingual speech recognition model

Example

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/audio/transcriptions' \
  -H 'Authorization: Bearer <token>' \
  -F 'file=@recording.mp3' \
  -F 'model=whisper-large-v3' \
  -F 'language=en' \
  -F 'response_format=json'

Returns

{
  "text": "The attacker exploited a buffer overflow vulnerability in the authentication module.",
  "duration": 4.2,
  "language": "en"
}

Response Schema

Field	Type	Description
text	string	The transcribed text
duration	float	Duration of the audio in seconds (when available)
language	string	Detected or specified language code (when available)

Text-to-Speech

Convert text into spoken audio using a high-quality text-to-speech model.

Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/speech

Method: POST

Parameters

Headers

Header	Type	Required	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication
Content-Type	string	Yes	Must be `application/json`

Request Body

{
  "model": "kokoro-82m",
  "input": "Critical vulnerability detected in your network perimeter.",
  "voice": "af_heart",
  "response_format": "mp3",
  "speed": 1.0
}

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	Model ID to use for speech generation
input	string	Yes	The text to convert to speech
voice	string	Yes	Voice ID to use (e.g. `af_heart`)
response_format	string	No	Audio format. `mp3` (default), `wav`, `flac`, `pcm`
speed	float	No	Speech speed multiplier. Range: 0.25–4.0, default: 1.0

Available Models

Model	Provider	Owned By	Description
kokoro-82m	threatwinds	hexgrad	Compact, high-quality text-to-speech model

Available Voices

Kokoro-82M supports 54 voices across 8 languages. See the full Voice Reference for all voice IDs grouped by language and gender.

Example

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/audio/speech' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "kokoro-82m",
  "input": "Alert: suspicious lateral movement detected on host 192.168.1.45.",
  "voice": "af_heart"
}' \
  --output alert.mp3

Returns

The response body is binary audio data. The Content-Type response header reflects the chosen format (e.g. audio/mpeg for mp3, audio/wav for wav).

Error Response Headers

All error responses include the following custom headers:

Header	Description
x-error	Human-readable error message describing what went wrong
x-error-id	Unique identifier for error tracking and support

Error Codes

Status Code	Description	Possible Cause
400	Bad Request	Missing required field (`file`, `model`, `input`, or `voice`), invalid `temperature`, invalid `speed`
401	Unauthorized	Missing or invalid authentication credentials
403	Forbidden	Insufficient permissions for AI API access
500	Internal Server Error	Audio service unavailable or server-side error

Audio

Contents

Speech-to-Text (Transcriptions)

Parameters

Headers

Form Fields

Available Models

Example

Returns

Response Schema

Text-to-Speech

Parameters

Headers

Request Body

Request Parameters

Available Models

Available Voices

Example

Returns

Error Response Headers

Error Codes