Audio
The Audio API provides speech-to-text transcription and text-to-speech synthesis using ThreatWinds-hosted audio models. All endpoints are OpenAI-compatible.
Contents
Speech-to-Text (Transcriptions)
Transcribe audio files into text using the Whisper model.
Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/transcriptions
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be multipart/form-data |
Note: You must use either the Authorization header OR the api-key/api-secret combination.
Form Fields
| Field | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | Audio file to transcribe. Supported formats: mp3, wav, ogg, flac |
| model | string | Yes | Model ID to use for transcription |
| language | string | No | Language of the audio in ISO-639-1 format (e.g. en, es, fr). Auto-detected if omitted |
| response_format | string | No | Format of the transcription output. json (default), text, srt, verbose_json, vtt |
| temperature | float | No | Sampling temperature between 0.0 and 1.0. Lower values produce more deterministic output |
| prompt | string | No | Optional text to guide the model’s style or provide context |
Available Models
| Model | Description |
|---|---|
| whisper-large-v3 | High-accuracy multilingual speech recognition model by OpenAI |
Example
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/audio/transcriptions' \
-H 'Authorization: Bearer <token>' \
-F 'file=@recording.mp3' \
-F 'model=whisper-large-v3' \
-F 'language=en' \
-F 'response_format=json'
Returns
{
"text": "The attacker exploited a buffer overflow vulnerability in the authentication module.",
"duration": 4.2,
"language": "en"
}
Response Schema
| Field | Type | Description |
|---|---|---|
| text | string | The transcribed text |
| duration | float | Duration of the audio in seconds (when available) |
| language | string | Detected or specified language code (when available) |
Text-to-Speech
Convert text into spoken audio using Kokoro, a high-quality TTS model.
Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/speech
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be application/json |
Request Body
{
"model": "kokoro-82m",
"input": "Critical vulnerability detected in your network perimeter.",
"voice": "af_heart",
"response_format": "mp3",
"speed": 1.0
}
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for speech generation |
| input | string | Yes | The text to convert to speech |
| voice | string | Yes | Voice ID to use (e.g. af_heart) |
| response_format | string | No | Audio format. mp3 (default), wav, opus, flac |
| speed | float | No | Speech speed multiplier. Range: 0.25–4.0, default: 1.0 |
Available Models
| Model | Description |
|---|---|
| kokoro-82m | Compact, high-quality text-to-speech model by hexgrad |
Available Voices
Kokoro-82M supports 54 voices across 8 languages. See the full Voice Reference for all voice IDs grouped by language and gender.
Example
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/audio/speech' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-82m",
"input": "Alert: suspicious lateral movement detected on host 192.168.1.45.",
"voice": "af_heart"
}' \
--output alert.mp3
Returns
The response body is binary audio data. The Content-Type response header reflects the chosen format (e.g. audio/mpeg for mp3, audio/wav for wav).
Error Response Headers
All error responses include the following custom headers:
| Header | Description |
|---|---|
| x-error | Human-readable error message describing what went wrong |
| x-error-id | Unique MD5 hash identifier for error tracking and support |
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 400 | Bad Request | Missing required field (file, model, input, or voice), invalid temperature, invalid speed |
| 401 | Unauthorized | Missing or invalid authentication credentials |
| 403 | Forbidden | Insufficient permissions for AI API access |
| 500 | Internal Server Error | Audio service unavailable or server-side error |