AI API

The ThreatWinds AI API provides unified access to multiple AI providers (Claude, Groq, vLLM) through a standardized OpenAI-compatible interface. It handles model management, chat completions with streaming, token counting, and gateway proxying.

Overview

ThreatWinds AI API allows you to:

Feature	Description	Documentation
Model Management	List and query available AI models across providers	Models
Chat Completions	Generate AI responses with streaming and tool support	Chat Completions
Token Counting	Count tokens before making inference requests	Token Counting
Gateway Proxy	Direct access to provider APIs	Gateway

Key Features

OpenAI-compatible API - Drop-in replacement for OpenAI’s Chat Completions API
SSE Streaming - Real-time streaming responses for better UX
Function Calling - Tool/function support for external integrations
Multi-provider - Access Claude, Groq, and vLLM through one API
Multimodal - Support for text, images, audio, and video inputs
Extended Reasoning - Enable step-by-step reasoning for complex tasks

Authentication

The AI API supports two authentication methods:

Authentication Method	Description
Bearer Token	Session-based authentication using `Authorization: Bearer <token>` header
API Key	API key authentication using `api-key` and `api-secret` headers

For details on how to obtain authentication credentials, see the Authentication section.

API Endpoints

The base URL for the AI API is:

https://apis.threatwinds.com/api/ai/v1

For detailed information about each endpoint, please refer to the specific documentation pages.

Supported Providers

The AI API aggregates models from 3 providers with 9 total models:

Provider	Models	Capabilities
Claude (Anthropic)	Sonnet 4.5, Opus 4.5, Haiku 4.5 (3 models)	Chat, tools-use, reasoning, code-generation, vision
Groq	GPT OSS 20B/120B, Qwen 3 32B, LLaMA 4 Maverick/Scout (5 models)	Fast inference, chat, code-generation, tools-use, reasoning
vLLM (ThreatWinds)	Silas 1.0 (1 model)	Cybersecurity, pentesting, threat intelligence

Model Summary

Total Models: 9
Claude Models: 3 (claude-sonnet-4, claude-opus-4, claude-haiku-4)
Groq Models: 5 (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout)
vLLM Models: 1 (silas-1.0)

Common Use Cases

Simple Chat Completion

Generate AI responses from user messages:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Explain XDR"}]
  }'

Streaming Response

Enable real-time streaming for better UX:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Explain XDR"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Count Tokens Before Request

Estimate costs by counting tokens:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Long message..."}]
  }'

Note: Token counting is only supported for Claude and vLLM models. Groq models will return a 400 error.

List Available Models

Discover all available models:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

Direct Provider Access

Use gateway for provider-specific features:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/gateway/claude/v1/messages' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{...provider-specific payload...}'

Error Response Headers

All error responses include the following custom headers:

Header	Description
x-error	Human-readable error message describing what went wrong
x-error-id	Unique MD5 hash identifier for error tracking and support

Error Codes

Status Code	Description	Possible Cause
200	OK	Request successful with data (model list, chat response, token count)
400	Bad Request	Invalid parameters, validation error, empty messages, or malformed JSON
401	Unauthorized	Missing or invalid authentication credentials
403	Forbidden	Insufficient permissions for AI API access
404	Not Found	Model or provider not found
500	Internal Server Error	Provider error, AI service unavailable, or server-side error

Model Capabilities

AI models expose various capabilities:

Capability	Description
chat	Text-based conversation
text-generation	General text generation
code-generation	Code generation and completion
tools-use	Function/tool calling
reasoning	Extended reasoning capabilities
image	Image understanding (vision)
audio	Audio processing
video	Video processing

Token Limits

Each model has defined token limits:

max_input_tokens: Maximum tokens in input messages
max_completion_tokens: Maximum tokens the model can generate
max_total_tokens: Maximum combined input + output tokens

Check model details to see specific limits for each model.

Response Formats

The AI API supports various response formats:

Text Response (Default)

Standard text completion response.

Streaming Response (SSE)

Enable stream: true for Server-Sent Events streaming:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

JSON Object

Structured JSON output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

JSON output matching a specific schema:

{
  "response_format": {
    "type": "json_object",
    "json_schema": {
      "type": "object",
      "properties": {
        "answer": {"type": "string"}
      }
    }
  }
}

Advanced Features

Streaming (SSE)

Enable real-time streaming for responsive UX:

{
  "stream": true,
  "stream_options": {"include_usage": true}
}

Response chunks are sent as Server-Sent Events:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

Function Calling (Tools)

Enable models to call external functions:

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {"type": "object", "properties": {...}}
    }
  }],
  "tool_choice": "auto"
}

Reasoning Effort

Control AI reasoning depth (models with reasoning capability):

Value	Description	Token Budget
low	Low reasoning effort	25% of max tokens
medium	Moderate reasoning	33% of max tokens
high	Extended reasoning	50% of max tokens

Note: When reasoning is enabled, temperature is automatically forced to 1.0 for Claude.

Provider Behavior: When reasoning_effort is omitted, Claude explicitly disables reasoning, but Groq and vLLM use the model’s default behavior (which may include reasoning).

Groq Limitation: qwen3-32b always uses “default” reasoning internally regardless of the value sent.

Service Tier

Control request priority (provider-specific):

Value	Description
auto	Automatic tier selection (default)
default	Standard processing
flex	Flexible scheduling
priority	Priority processing

Note: Service tier support varies by provider and model.

Best Practices

Cost Optimization

Count Tokens First: Use token counting endpoint before large requests
Set Max Tokens: Always set max_completion_tokens to control costs
Choose Appropriate Models: Use smaller models for simple tasks
Monitor Usage: Track token usage via billing API limits

Performance

Use Streaming: Enable streaming for better perceived latency
Use Appropriate Providers:
- Groq for ultra-fast inference (optimized hardware)
- Claude for highest quality reasoning and complex problem-solving
- Silas for cybersecurity-specific tasks
Minimize Context: Send only necessary message history
Batch Requests: Process multiple items in parallel when possible
Cache Results: Cache frequent queries to reduce API calls

Security

Validate Input: Always validate user input before sending to AI
Sanitize Output: Sanitize AI responses before displaying to users
Monitor Usage: Track unusual patterns via logs
Rotate Keys: Regularly rotate API keys and secrets

Error Handling

Handle Provider Errors: Be prepared for provider-specific errors
Implement Retries: Add exponential backoff for transient errors
Check Token Limits: Validate against model token limits before requests
Log Errors: Log all errors with interaction IDs for debugging