AI API

The ThreatWinds AI API provides unified access to multiple AI providers (Claude, Groq, vLLM) through a standardized OpenAI-compatible interface. It handles model management, chat completions with streaming, token counting, and gateway proxying.

Overview

ThreatWinds AI API allows you to:

Feature Description Documentation
Model Management List and query available AI models across providers Models
Chat Completions Generate AI responses with streaming and tool support Chat Completions
Token Counting Count tokens before making inference requests Token Counting
Gateway Proxy Direct access to provider APIs Gateway

Key Features

  • OpenAI-compatible API - Drop-in replacement for OpenAI’s Chat Completions API
  • SSE Streaming - Real-time streaming responses for better UX
  • Function Calling - Tool/function support for external integrations
  • Multi-provider - Access Claude, Groq, and vLLM through one API
  • Multimodal - Support for text, images, audio, and video inputs
  • Extended Reasoning - Enable step-by-step reasoning for complex tasks

Authentication

The AI API supports two authentication methods:

Authentication Method Description
Bearer Token Session-based authentication using Authorization: Bearer <token> header
API Key API key authentication using api-key and api-secret headers

For details on how to obtain authentication credentials, see the Authentication section.

API Endpoints

The base URL for the AI API is:

https://apis.threatwinds.com/api/ai/v1

For detailed information about each endpoint, please refer to the specific documentation pages.

Supported Providers

The AI API aggregates models from 3 providers with 9 total models:

Provider Models Capabilities
Claude (Anthropic) Sonnet 4.5, Opus 4.5, Haiku 4.5 (3 models) Chat, tools-use, reasoning, code-generation, vision
Groq GPT OSS 20B/120B, Qwen 3 32B, LLaMA 4 Maverick/Scout (5 models) Fast inference, chat, code-generation, tools-use, reasoning
vLLM (ThreatWinds) Silas 1.0 (1 model) Cybersecurity, pentesting, threat intelligence

Model Summary

  • Total Models: 9
  • Claude Models: 3 (claude-sonnet-4, claude-opus-4, claude-haiku-4)
  • Groq Models: 5 (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout)
  • vLLM Models: 1 (silas-1.0)

Common Use Cases

Simple Chat Completion

Generate AI responses from user messages:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Explain XDR"}]
  }'

Streaming Response

Enable real-time streaming for better UX:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Explain XDR"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Count Tokens Before Request

Estimate costs by counting tokens:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Long message..."}]
  }'

Note: Token counting is only supported for Claude and vLLM models. Groq models will return a 400 error.

List Available Models

Discover all available models:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

Direct Provider Access

Use gateway for provider-specific features:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/gateway/claude/v1/messages' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{...provider-specific payload...}'

Error Response Headers

All error responses include the following custom headers:

Header Description
x-error Human-readable error message describing what went wrong
x-error-id Unique MD5 hash identifier for error tracking and support

Error Codes

Status Code Description Possible Cause
200 OK Request successful with data (model list, chat response, token count)
400 Bad Request Invalid parameters, validation error, empty messages, or malformed JSON
401 Unauthorized Missing or invalid authentication credentials
403 Forbidden Insufficient permissions for AI API access
404 Not Found Model or provider not found
500 Internal Server Error Provider error, AI service unavailable, or server-side error

Model Capabilities

AI models expose various capabilities:

Capability Description
chat Text-based conversation
text-generation General text generation
code-generation Code generation and completion
tools-use Function/tool calling
reasoning Extended reasoning capabilities
image Image understanding (vision)
audio Audio processing
video Video processing

Token Limits

Each model has defined token limits:

  • max_input_tokens: Maximum tokens in input messages
  • max_completion_tokens: Maximum tokens the model can generate
  • max_total_tokens: Maximum combined input + output tokens

Check model details to see specific limits for each model.

Response Formats

The AI API supports various response formats:

Text Response (Default)

Standard text completion response.

Streaming Response (SSE)

Enable stream: true for Server-Sent Events streaming:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

JSON Object

Structured JSON output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

JSON output matching a specific schema:

{
  "response_format": {
    "type": "json_object",
    "json_schema": {
      "type": "object",
      "properties": {
        "answer": {"type": "string"}
      }
    }
  }
}

Advanced Features

Streaming (SSE)

Enable real-time streaming for responsive UX:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

Response chunks are sent as Server-Sent Events:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

Function Calling (Tools)

Enable models to call external functions:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {"type": "object", "properties": {...}}
    }
  }],
  "tool_choice": "auto"
}

Reasoning Effort

Control AI reasoning depth (models with reasoning capability):

Value Description Token Budget
low Low reasoning effort 25% of max tokens
medium Moderate reasoning 33% of max tokens
high Extended reasoning 50% of max tokens

Note: When reasoning is enabled, temperature is automatically forced to 1.0 for Claude.

Provider Behavior: When reasoning_effort is omitted, Claude explicitly disables reasoning, but Groq and vLLM use the model’s default behavior (which may include reasoning).

Groq Limitation: qwen3-32b always uses “default” reasoning internally regardless of the value sent.

Service Tier

Control request priority (provider-specific):

Value Description
auto Automatic tier selection (default)
default Standard processing
flex Flexible scheduling
priority Priority processing

Note: Service tier support varies by provider and model.

Best Practices

Cost Optimization

  1. Count Tokens First: Use token counting endpoint before large requests
  2. Set Max Tokens: Always set max_completion_tokens to control costs
  3. Choose Appropriate Models: Use smaller models for simple tasks
  4. Monitor Usage: Track token usage via billing API limits

Performance

  1. Use Streaming: Enable streaming for better perceived latency
  2. Use Appropriate Providers:
    • Groq for ultra-fast inference (optimized hardware)
    • Claude for highest quality reasoning and complex problem-solving
    • Silas for cybersecurity-specific tasks
  3. Minimize Context: Send only necessary message history
  4. Batch Requests: Process multiple items in parallel when possible
  5. Cache Results: Cache frequent queries to reduce API calls

Security

  1. Validate Input: Always validate user input before sending to AI
  2. Sanitize Output: Sanitize AI responses before displaying to users
  3. Monitor Usage: Track unusual patterns via logs
  4. Rotate Keys: Regularly rotate API keys and secrets

Error Handling

  1. Handle Provider Errors: Be prepared for provider-specific errors
  2. Implement Retries: Add exponential backoff for transient errors
  3. Check Token Limits: Validate against model token limits before requests
  4. Log Errors: Log all errors with interaction IDs for debugging

Table of contents