AI API

The ThreatWinds AI API provides unified access to multiple AI providers (OpenAI, Gemini, Claude, and ThreatWinds self-hosted models) through a standardized OpenAI-compatible interface. It handles model management, chat completions with streaming, token counting, embeddings generation, and audio processing.

Overview

ThreatWinds AI API allows you to:

Feature	Description	Documentation
Model Management	List and query available AI models across providers	Models
Chat Completions	Generate AI responses with streaming and tool support	Chat Completions
Token Counting	Count tokens before making inference requests	Token Counting
Embeddings	Generate vector embeddings for semantic search and similarity	Embeddings
Audio	Speech-to-text transcription and text-to-speech synthesis	Audio
Voices	Reference list of all available TTS voice IDs	Voices
Data Store	Multi-tenant structured storage for AI workflows	Store

Key Features

OpenAI-compatible API - Drop-in replacement for OpenAI’s Chat Completions API
SSE Streaming - Real-time streaming responses for better UX
Function Calling - Tool/function support for external integrations
Multi-provider - Access OpenAI, Gemini, Claude, and ThreatWinds self-hosted models through one API
Multimodal - Support for text, images, and audio inputs
Extended Reasoning - Enable step-by-step reasoning for complex tasks

Authentication

The AI API supports two authentication methods:

Authentication Method	Description
Bearer Token	Session-based authentication using `Authorization: Bearer <token>` header
API Key	API key authentication using `api-key` and `api-secret` headers

For details on how to obtain authentication credentials, see the Authentication section.

API Endpoints

The base URL for the AI API is:

https://apis.threatwinds.com/api/ai/v1

For detailed information about each endpoint, please refer to the specific documentation pages.

Supported Providers

The AI API aggregates models from multiple providers. Use the /models endpoint to discover all currently available models and their capabilities.

Provider	Description	Capabilities
OpenAI	External provider	Chat, tools-use, reasoning, code-generation, vision
Gemini (Google)	External provider	Chat, tools-use, reasoning, vision
Claude (Anthropic)	External provider	Chat, tools-use, reasoning, code-generation, vision
ThreatWinds	Self-hosted models (chat, embeddings, audio)	Chat, tools-use, text/code generation, text/vision embeddings, stt/tts

Note: All self-hosted backends (chat, embeddings, audio) are reported under the single threatwinds provider ID on the unified API, regardless of the underlying serving technology. Each model’s owned_by field still identifies the original maintainer (e.g. Alibaba for Qwen3, ThreatWinds for Silas, OpenAI for Whisper).

Discover Models Dynamically

Model availability changes frequently. Query the /models endpoint for the current catalog:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

The response includes all active models with their provider, capabilities, and token limits.

Common Use Cases

Simple Chat Completion

Generate AI responses using any chat-capable model:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5",
    "messages": [{"role": "user", "content": "Explain XDR"}]
  }'

Streaming Response

Enable real-time streaming for better UX:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [{"role": "user", "content": "Explain XDR"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Count Tokens Before Request

Estimate costs by counting tokens:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [{"role": "user", "content": "Long message..."}]
  }'

Note: Token counting is supported for OpenAI, Gemini, and ThreatWinds chat models. Claude models are not supported on this endpoint — read usage from /chat/completions responses instead. Embedding and audio models use their dedicated endpoints.

List Available Models

Discover all available models:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

Error Response Headers

All error responses include the following custom headers:

Header	Description
x-error	Human-readable error message describing what went wrong
x-error-id	Unique identifier for error tracking and support

Error Codes

Status Code	Description	Possible Cause
200	OK	Request successful with data (model list, chat response, token count)
400	Bad Request	Invalid parameters, validation error, empty messages, or malformed JSON
401	Unauthorized	Missing or invalid authentication credentials
403	Forbidden	Insufficient permissions for AI API access
404	Not Found	Model or provider not found
500	Internal Server Error	Provider error, AI service unavailable, or server-side error

Model Capabilities

AI models expose various capabilities:

Capability	Description
chat	Text-based conversation
text-generation	General text generation
code-generation	Code generation and completion
tools-use	Function/tool calling
reasoning	Extended reasoning capabilities
image	Image understanding (vision)
transcription	Audio-to-text speech recognition
speech	Text-to-speech synthesis
embeddings	Vector embedding generation
vision-embeddings	Multimodal embeddings (text + images)

Token Limits

Each model has defined token limits:

max_input_tokens: Maximum tokens in input messages
max_completion_tokens: Maximum tokens the model can generate
max_total_tokens: Maximum combined input + output tokens

Check model details to see specific limits for each model.

Response Formats

The AI API supports various response formats:

Text Response (Default)

Standard text completion response.

Streaming Response (SSE)

Enable stream: true for Server-Sent Events streaming:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

JSON Object

Structured JSON output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

JSON output matching a specific schema:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"}
        },
        "required": ["answer"],
        "additionalProperties": false
      }
    }
  }
}

Advanced Features

Streaming (SSE)

Enable real-time streaming for responsive UX:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

Response chunks are sent as Server-Sent Events:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

Function Calling (Tools)

Enable models to call external functions:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {"type": "object", "properties": {...}}
    }
  }],
  "tool_choice": "auto"
}

Reasoning Effort

Control AI reasoning depth (models with reasoning capability):

Value	Description	Token Budget
low	Low reasoning effort	25% of max tokens
medium	Moderate reasoning	33% of max tokens
high	Extended reasoning	50% of max tokens

Note: Provider behavior varies when reasoning_effort is omitted. Some providers may enable default reasoning, others disable it.

Service Tier

Control request priority (provider-specific):

Value	Description
auto	Automatic tier selection (default)
default	Standard processing
flex	Flexible scheduling
priority	Priority processing

Note: Service tier support varies by provider and model.

Best Practices

Cost Optimization

Count Tokens First: Use token counting endpoint before large requests
Set Max Tokens: Always set max_completion_tokens to control costs
Choose Appropriate Models: Use smaller models for simple tasks
Monitor Usage: Track token usage via billing API limits

Performance

Use Streaming: Enable streaming for better perceived latency
Choose Appropriate Providers:
- OpenAI/Gemini/Claude for general-purpose reasoning and complex problem-solving
- ThreatWinds (Silas) for cybersecurity-specific tasks
Minimize Context: Send only necessary message history
Batch Requests: Process multiple items in parallel when possible
Cache Results: Cache frequent queries to reduce API calls

Security

Validate Input: Always validate user input before sending to AI
Sanitize Output: Sanitize AI responses before displaying to users
Monitor Usage: Track unusual patterns via logs
Rotate Keys: Regularly rotate API keys and secrets

Error Handling

Handle Provider Errors: Be prepared for provider-specific errors
Implement Retries: Add exponential backoff for transient errors
Check Token Limits: Validate against model token limits before requests
Log Errors: Log all errors with interaction IDs for debugging