AI API
The ThreatWinds AI API provides unified access to multiple AI providers (OpenAI, Gemini, Claude, and ThreatWinds self-hosted models) through a standardized OpenAI-compatible interface. It handles model management, chat completions with streaming, token counting, embeddings generation, and audio processing.
Overview
ThreatWinds AI API allows you to:
| Feature | Description | Documentation |
|---|---|---|
| Model Management | List and query available AI models across providers | Models |
| Chat Completions | Generate AI responses with streaming and tool support | Chat Completions |
| Token Counting | Count tokens before making inference requests | Token Counting |
| Embeddings | Generate vector embeddings for semantic search and similarity | Embeddings |
| Audio | Speech-to-text transcription and text-to-speech synthesis | Audio |
| Voices | Reference list of all available TTS voice IDs | Voices |
| Data Store | Multi-tenant structured storage for AI workflows | Store |
Key Features
- OpenAI-compatible API - Drop-in replacement for OpenAI’s Chat Completions API
- SSE Streaming - Real-time streaming responses for better UX
- Function Calling - Tool/function support for external integrations
- Multi-provider - Access OpenAI, Gemini, Claude, and ThreatWinds self-hosted models through one API
- Multimodal - Support for text, images, and audio inputs
- Extended Reasoning - Enable step-by-step reasoning for complex tasks
Authentication
The AI API supports two authentication methods:
| Authentication Method | Description |
|---|---|
| Bearer Token | Session-based authentication using Authorization: Bearer <token> header |
| API Key | API key authentication using api-key and api-secret headers |
For details on how to obtain authentication credentials, see the Authentication section.
API Endpoints
The base URL for the AI API is:
https://apis.threatwinds.com/api/ai/v1
For detailed information about each endpoint, please refer to the specific documentation pages.
Supported Providers
The AI API aggregates models from multiple providers. Use the /models endpoint to discover all currently available models and their capabilities.
| Provider | Description | Capabilities |
|---|---|---|
| OpenAI | External provider | Chat, tools-use, reasoning, code-generation, vision |
| Gemini (Google) | External provider | Chat, tools-use, reasoning, vision |
| Claude (Anthropic) | External provider | Chat, tools-use, reasoning, code-generation, vision |
| ThreatWinds | Self-hosted models (chat, embeddings, audio) | Chat, tools-use, text/code generation, text/vision embeddings, stt/tts |
Note: All self-hosted backends (chat, embeddings, audio) are reported under the single
threatwindsprovider ID on the unified API, regardless of the underlying serving technology. Each model’sowned_byfield still identifies the original maintainer (e.g.Alibabafor Qwen3,ThreatWindsfor Silas,OpenAIfor Whisper).
Discover Models Dynamically
Model availability changes frequently. Query the /models endpoint for the current catalog:
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
-H 'Authorization: Bearer <token>'
The response includes all active models with their provider, capabilities, and token limits.
Common Use Cases
Simple Chat Completion
Generate AI responses using any chat-capable model:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5",
"messages": [{"role": "user", "content": "Explain XDR"}]
}'
Streaming Response
Enable real-time streaming for better UX:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [{"role": "user", "content": "Explain XDR"}],
"stream": true,
"stream_options": {"include_usage": true}
}'
Count Tokens Before Request
Estimate costs by counting tokens:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [{"role": "user", "content": "Long message..."}]
}'
Note: Token counting is supported for OpenAI, Gemini, and ThreatWinds chat models. Claude models are not supported on this endpoint — read
usagefrom/chat/completionsresponses instead. Embedding and audio models use their dedicated endpoints.
List Available Models
Discover all available models:
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
-H 'Authorization: Bearer <token>'
Error Response Headers
All error responses include the following custom headers:
| Header | Description |
|---|---|
| x-error | Human-readable error message describing what went wrong |
| x-error-id | Unique identifier for error tracking and support |
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Request successful with data (model list, chat response, token count) |
| 400 | Bad Request | Invalid parameters, validation error, empty messages, or malformed JSON |
| 401 | Unauthorized | Missing or invalid authentication credentials |
| 403 | Forbidden | Insufficient permissions for AI API access |
| 404 | Not Found | Model or provider not found |
| 500 | Internal Server Error | Provider error, AI service unavailable, or server-side error |
Model Capabilities
AI models expose various capabilities:
| Capability | Description |
|---|---|
| chat | Text-based conversation |
| text-generation | General text generation |
| code-generation | Code generation and completion |
| tools-use | Function/tool calling |
| reasoning | Extended reasoning capabilities |
| image | Image understanding (vision) |
| transcription | Audio-to-text speech recognition |
| speech | Text-to-speech synthesis |
| embeddings | Vector embedding generation |
| vision-embeddings | Multimodal embeddings (text + images) |
Token Limits
Each model has defined token limits:
- max_input_tokens: Maximum tokens in input messages
- max_completion_tokens: Maximum tokens the model can generate
- max_total_tokens: Maximum combined input + output tokens
Check model details to see specific limits for each model.
Response Formats
The AI API supports various response formats:
Text Response (Default)
Standard text completion response.
Streaming Response (SSE)
Enable stream: true for Server-Sent Events streaming:
{
"stream": true,
"stream_options": {"include_usage": true}
}
JSON Object
Structured JSON output:
{
"response_format": {
"type": "json_object"
}
}
JSON Schema
JSON output matching a specific schema:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "answer_schema",
"strict": true,
"schema": {
"type": "object",
"properties": {
"answer": {"type": "string"}
},
"required": ["answer"],
"additionalProperties": false
}
}
}
}
Advanced Features
Streaming (SSE)
Enable real-time streaming for responsive UX:
{
"stream": true,
"stream_options": {"include_usage": true}
}
Response chunks are sent as Server-Sent Events:
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]
Function Calling (Tools)
Enable models to call external functions:
{
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {"type": "object", "properties": {...}}
}
}],
"tool_choice": "auto"
}
Reasoning Effort
Control AI reasoning depth (models with reasoning capability):
| Value | Description | Token Budget |
|---|---|---|
| low | Low reasoning effort | 25% of max tokens |
| medium | Moderate reasoning | 33% of max tokens |
| high | Extended reasoning | 50% of max tokens |
Note: Provider behavior varies when
reasoning_effortis omitted. Some providers may enable default reasoning, others disable it.
Service Tier
Control request priority (provider-specific):
| Value | Description |
|---|---|
| auto | Automatic tier selection (default) |
| default | Standard processing |
| flex | Flexible scheduling |
| priority | Priority processing |
Note: Service tier support varies by provider and model.
Best Practices
Cost Optimization
- Count Tokens First: Use token counting endpoint before large requests
- Set Max Tokens: Always set
max_completion_tokensto control costs - Choose Appropriate Models: Use smaller models for simple tasks
- Monitor Usage: Track token usage via billing API limits
Performance
- Use Streaming: Enable streaming for better perceived latency
- Choose Appropriate Providers:
- OpenAI/Gemini/Claude for general-purpose reasoning and complex problem-solving
- ThreatWinds (Silas) for cybersecurity-specific tasks
- Minimize Context: Send only necessary message history
- Batch Requests: Process multiple items in parallel when possible
- Cache Results: Cache frequent queries to reduce API calls
Security
- Validate Input: Always validate user input before sending to AI
- Sanitize Output: Sanitize AI responses before displaying to users
- Monitor Usage: Track unusual patterns via logs
- Rotate Keys: Regularly rotate API keys and secrets
Error Handling
- Handle Provider Errors: Be prepared for provider-specific errors
- Implement Retries: Add exponential backoff for transient errors
- Check Token Limits: Validate against model token limits before requests
- Log Errors: Log all errors with interaction IDs for debugging