Audio

The Audio API provides speech-to-text transcription and text-to-speech synthesis using ThreatWinds-hosted audio models. All endpoints are OpenAI-compatible.

Contents


Speech-to-Text (Transcriptions)

Transcribe audio files into text using the Whisper model.

Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/transcriptions

Method: POST

Parameters

Headers

Header Type Required Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes Must be multipart/form-data

Note: You must use either the Authorization header OR the api-key/api-secret combination.

Form Fields

Field Type Required Description
file file Yes Audio file to transcribe. Supported formats: mp3, wav, ogg, flac
model string Yes Model ID to use for transcription
language string No Language of the audio in ISO-639-1 format (e.g. en, es, fr). Auto-detected if omitted
response_format string No Format of the transcription output. json (default), text, srt, verbose_json, vtt
temperature float No Sampling temperature between 0.0 and 1.0. Lower values produce more deterministic output
prompt string No Optional text to guide the model’s style or provide context

Available Models

Model Description
whisper-large-v3 High-accuracy multilingual speech recognition model by OpenAI

Example

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/audio/transcriptions' \
  -H 'Authorization: Bearer <token>' \
  -F 'file=@recording.mp3' \
  -F 'model=whisper-large-v3' \
  -F 'language=en' \
  -F 'response_format=json'

Returns

{
  "text": "The attacker exploited a buffer overflow vulnerability in the authentication module.",
  "duration": 4.2,
  "language": "en"
}

Response Schema

Field Type Description
text string The transcribed text
duration float Duration of the audio in seconds (when available)
language string Detected or specified language code (when available)

Text-to-Speech

Convert text into spoken audio using Kokoro, a high-quality TTS model.

Endpoint: https://apis.threatwinds.com/api/ai/v1/audio/speech

Method: POST

Parameters

Headers

Header Type Required Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes Must be application/json

Request Body

{
  "model": "kokoro-82m",
  "input": "Critical vulnerability detected in your network perimeter.",
  "voice": "af_heart",
  "response_format": "mp3",
  "speed": 1.0
}

Request Parameters

Parameter Type Required Description
model string Yes Model ID to use for speech generation
input string Yes The text to convert to speech
voice string Yes Voice ID to use (e.g. af_heart)
response_format string No Audio format. mp3 (default), wav, opus, flac
speed float No Speech speed multiplier. Range: 0.25–4.0, default: 1.0

Available Models

Model Description
kokoro-82m Compact, high-quality text-to-speech model by hexgrad

Available Voices

Kokoro-82M supports 54 voices across 8 languages. See the full Voice Reference for all voice IDs grouped by language and gender.

Example

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/audio/speech' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "kokoro-82m",
  "input": "Alert: suspicious lateral movement detected on host 192.168.1.45.",
  "voice": "af_heart"
}' \
  --output alert.mp3

Returns

The response body is binary audio data. The Content-Type response header reflects the chosen format (e.g. audio/mpeg for mp3, audio/wav for wav).


Error Response Headers

All error responses include the following custom headers:

Header Description
x-error Human-readable error message describing what went wrong
x-error-id Unique MD5 hash identifier for error tracking and support

Error Codes

Status Code Description Possible Cause
400 Bad Request Missing required field (file, model, input, or voice), invalid temperature, invalid speed
401 Unauthorized Missing or invalid authentication credentials
403 Forbidden Insufficient permissions for AI API access
500 Internal Server Error Audio service unavailable or server-side error