AI API

The ThreatWinds AI API provides unified access to multiple AI providers (OpenAI, Gemini, Claude, and ThreatWinds self-hosted models) through a standardized OpenAI-compatible interface. It handles model management, chat completions with streaming, token counting, embeddings generation, and audio processing.

Overview

ThreatWinds AI API allows you to:

Feature Description Documentation
Model Management List and query available AI models across providers Models
Chat Completions Generate AI responses with streaming and tool support Chat Completions
Token Counting Count tokens before making inference requests Token Counting
Embeddings Generate vector embeddings for semantic search and similarity Embeddings
Audio Speech-to-text transcription and text-to-speech synthesis Audio
Voices Reference list of all available TTS voice IDs Voices
Data Store Multi-tenant structured storage for AI workflows Store

Key Features

  • OpenAI-compatible API - Drop-in replacement for OpenAI’s Chat Completions API
  • SSE Streaming - Real-time streaming responses for better UX
  • Function Calling - Tool/function support for external integrations
  • Multi-provider - Access OpenAI, Gemini, Claude, and ThreatWinds self-hosted models through one API
  • Multimodal - Support for text, images, and audio inputs
  • Extended Reasoning - Enable step-by-step reasoning for complex tasks

Authentication

The AI API supports two authentication methods:

Authentication Method Description
Bearer Token Session-based authentication using Authorization: Bearer <token> header
API Key API key authentication using api-key and api-secret headers

For details on how to obtain authentication credentials, see the Authentication section.

API Endpoints

The base URL for the AI API is:

https://apis.threatwinds.com/api/ai/v1

For detailed information about each endpoint, please refer to the specific documentation pages.

Supported Providers

The AI API aggregates models from multiple providers. Use the /models endpoint to discover all currently available models and their capabilities.

Provider Description Capabilities
OpenAI External provider Chat, tools-use, reasoning, code-generation, vision
Gemini (Google) External provider Chat, tools-use, reasoning, vision
Claude (Anthropic) External provider Chat, tools-use, reasoning, code-generation, vision
ThreatWinds Self-hosted models (chat, embeddings, audio) Chat, tools-use, text/code generation, text/vision embeddings, stt/tts

Note: All self-hosted backends (chat, embeddings, audio) are reported under the single threatwinds provider ID on the unified API, regardless of the underlying serving technology. Each model’s owned_by field still identifies the original maintainer (e.g. Alibaba for Qwen3, ThreatWinds for Silas, OpenAI for Whisper).

Discover Models Dynamically

Model availability changes frequently. Query the /models endpoint for the current catalog:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

The response includes all active models with their provider, capabilities, and token limits.

Common Use Cases

Simple Chat Completion

Generate AI responses using any chat-capable model:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5",
    "messages": [{"role": "user", "content": "Explain XDR"}]
  }'

Streaming Response

Enable real-time streaming for better UX:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [{"role": "user", "content": "Explain XDR"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Count Tokens Before Request

Estimate costs by counting tokens:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [{"role": "user", "content": "Long message..."}]
  }'

Note: Token counting is supported for OpenAI, Gemini, and ThreatWinds chat models. Claude models are not supported on this endpoint — read usage from /chat/completions responses instead. Embedding and audio models use their dedicated endpoints.

List Available Models

Discover all available models:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

Error Response Headers

All error responses include the following custom headers:

Header Description
x-error Human-readable error message describing what went wrong
x-error-id Unique identifier for error tracking and support

Error Codes

Status Code Description Possible Cause
200 OK Request successful with data (model list, chat response, token count)
400 Bad Request Invalid parameters, validation error, empty messages, or malformed JSON
401 Unauthorized Missing or invalid authentication credentials
403 Forbidden Insufficient permissions for AI API access
404 Not Found Model or provider not found
500 Internal Server Error Provider error, AI service unavailable, or server-side error

Model Capabilities

AI models expose various capabilities:

Capability Description
chat Text-based conversation
text-generation General text generation
code-generation Code generation and completion
tools-use Function/tool calling
reasoning Extended reasoning capabilities
image Image understanding (vision)
transcription Audio-to-text speech recognition
speech Text-to-speech synthesis
embeddings Vector embedding generation
vision-embeddings Multimodal embeddings (text + images)

Token Limits

Each model has defined token limits:

  • max_input_tokens: Maximum tokens in input messages
  • max_completion_tokens: Maximum tokens the model can generate
  • max_total_tokens: Maximum combined input + output tokens

Check model details to see specific limits for each model.

Response Formats

The AI API supports various response formats:

Text Response (Default)

Standard text completion response.

Streaming Response (SSE)

Enable stream: true for Server-Sent Events streaming:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

JSON Object

Structured JSON output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

JSON output matching a specific schema:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"}
        },
        "required": ["answer"],
        "additionalProperties": false
      }
    }
  }
}

Advanced Features

Streaming (SSE)

Enable real-time streaming for responsive UX:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

Response chunks are sent as Server-Sent Events:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

Function Calling (Tools)

Enable models to call external functions:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {"type": "object", "properties": {...}}
    }
  }],
  "tool_choice": "auto"
}

Reasoning Effort

Control AI reasoning depth (models with reasoning capability):

Value Description Token Budget
low Low reasoning effort 25% of max tokens
medium Moderate reasoning 33% of max tokens
high Extended reasoning 50% of max tokens

Note: Provider behavior varies when reasoning_effort is omitted. Some providers may enable default reasoning, others disable it.

Service Tier

Control request priority (provider-specific):

Value Description
auto Automatic tier selection (default)
default Standard processing
flex Flexible scheduling
priority Priority processing

Note: Service tier support varies by provider and model.

Best Practices

Cost Optimization

  1. Count Tokens First: Use token counting endpoint before large requests
  2. Set Max Tokens: Always set max_completion_tokens to control costs
  3. Choose Appropriate Models: Use smaller models for simple tasks
  4. Monitor Usage: Track token usage via billing API limits

Performance

  1. Use Streaming: Enable streaming for better perceived latency
  2. Choose Appropriate Providers:
    • OpenAI/Gemini/Claude for general-purpose reasoning and complex problem-solving
    • ThreatWinds (Silas) for cybersecurity-specific tasks
  3. Minimize Context: Send only necessary message history
  4. Batch Requests: Process multiple items in parallel when possible
  5. Cache Results: Cache frequent queries to reduce API calls

Security

  1. Validate Input: Always validate user input before sending to AI
  2. Sanitize Output: Sanitize AI responses before displaying to users
  3. Monitor Usage: Track unusual patterns via logs
  4. Rotate Keys: Regularly rotate API keys and secrets

Error Handling

  1. Handle Provider Errors: Be prepared for provider-specific errors
  2. Implement Retries: Add exponential backoff for transient errors
  3. Check Token Limits: Validate against model token limits before requests
  4. Log Errors: Log all errors with interaction IDs for debugging

Table of contents