AI API

The ThreatWinds AI API provides unified access to multiple AI providers (Claude, Groq) through a standardized interface. It handles model management, chat completions, token counting, and gateway proxying.

Overview

ThreatWinds AI API allows you to:

Feature Description Documentation
Model Management List and query available AI models across providers Models
Chat Completions Generate AI responses with multi-provider support Chat Completions
Token Counting Count tokens before making inference requests Token Counting
Gateway Proxy Direct access to provider APIs Gateway

Authentication

The AI API supports two authentication methods:

Authentication Method Description
Bearer Token Session-based authentication using Authorization: Bearer <token> header
API Key API key authentication using api-key and api-secret headers

For details on how to obtain authentication credentials, see the Authentication section.

API Endpoints

The base URL for the AI API is:

https://apis.threatwinds.com/api/ai/v1

For detailed information about each endpoint, please refer to the specific documentation pages.

Supported Providers

The AI API aggregates models from 2 providers with 7 total models:

Provider Models Capabilities
Claude (Anthropic) Sonnet 4, Opus 4 (2 models) Chat, tools-use, reasoning, code-generation
Groq GPT OSS 20B/120B, Qwen 3 32B, LLaMA 4 Maverick/Scout (5 models) Fast inference, chat, code-generation, tools-use, reasoning

Model Summary

  • Total Models: 7
  • Claude Models: 2 (claude-sonnet-4, claude-opus-4)
  • Groq Models: 5 (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout)

Common Use Cases

Simple Chat Completion

Generate AI responses from user messages:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Explain XDR"}]
  }'

Count Tokens Before Request

Estimate costs by counting tokens:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Long message..."}]
  }'

Note: Token counting is only supported for Claude models. Groq models will return a 400 error.

List Available Models

Discover all available models:

curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
  -H 'Authorization: Bearer <token>'

Direct Provider Access

Use gateway for provider-specific features:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/gateway/claude/v1/messages' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{...provider-specific payload...}'

Error Response Headers

All error responses include the following custom headers:

Header Description
x-error Human-readable error message describing what went wrong
x-error-id Unique MD5 hash identifier for error tracking and support

Error Codes

Status Code Description Possible Cause
200 OK Request successful with data (model list, chat response, token count)
400 Bad Request Invalid parameters, validation error, empty messages, or malformed JSON
401 Unauthorized Missing or invalid authentication credentials
403 Forbidden Insufficient permissions for AI API access
404 Not Found Model or provider not found
500 Internal Server Error Provider error, AI service unavailable, or server-side error

Model Capabilities

AI models expose various capabilities:

Capability Description
chat Text-based conversation
text-generation General text generation
code-generation Code generation and completion
embeddings Vector embeddings for semantic search
audio Audio processing and transcription
image Image understanding and generation
video Video processing
tools-use Function/tool calling
vision Image understanding in chat
reasoning Extended reasoning capabilities

Token Limits

Each model has defined token limits:

  • max_input_tokens: Maximum tokens in input messages
  • max_completion_tokens: Maximum tokens the model can generate
  • max_total_tokens: Maximum combined input + output tokens

Check model details to see specific limits for each model.

Response Formats

The AI API supports various response formats:

Text Response (Default)

Standard text completion response.

JSON Object

Structured JSON output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

JSON output matching a specific schema:

{
  "response_format": {
    "type": "json_object",
    "json_schema": {
      "type": "object",
      "properties": {
        "answer": {"type": "string"}
      }
    }
  }
}

Advanced Features

Reasoning Effort

Control AI reasoning depth (Claude models only):

Value Description Token Budget (Claude)
auto Automatic reasoning level (default, disables extended reasoning) N/A
low Low reasoning effort 25% of max tokens
medium Moderate reasoning 33% of max tokens
high Extended reasoning 50% of max tokens

Note: When reasoning is enabled (low/medium/high), temperature is automatically forced to 1.0.

Groq Limitation: qwen3-32b only accepts “none” or “default” for reasoning_effort parameter.

Service Tier

Control request priority (provider-specific):

Value Description
auto Automatic tier selection (default)
default Standard processing

Note: Service tier support varies by provider and model. Check provider documentation for details.

Best Practices

Cost Optimization

  1. Count Tokens First: Use token counting endpoint before large requests
  2. Set Max Tokens: Always set max_completion_tokens to control costs
  3. Choose Appropriate Models: Use smaller models for simple tasks
  4. Monitor Usage: Track token usage via billing API limits

Performance

  1. Use Appropriate Providers:
    • Groq for ultra-fast inference (optimized hardware)
    • Claude for highest quality reasoning and complex problem-solving
  2. Minimize Context: Send only necessary message history
  3. Batch Requests: Process multiple items in parallel when possible
  4. Cache Results: Cache frequent queries to reduce API calls

Security

  1. Validate Input: Always validate user input before sending to AI
  2. Sanitize Output: Sanitize AI responses before displaying to users
  3. Monitor Usage: Track unusual patterns via logs
  4. Rotate Keys: Regularly rotate API keys and secrets

Error Handling

  1. Handle Provider Errors: Be prepared for provider-specific errors
  2. Implement Retries: Add exponential backoff for transient errors
  3. Check Token Limits: Validate against model token limits before requests
  4. Log Errors: Log all errors with interaction IDs for debugging

Table of contents