AI API
The ThreatWinds AI API provides unified access to multiple AI providers (Claude, Groq) through a standardized interface. It handles model management, chat completions, token counting, and gateway proxying.
Overview
ThreatWinds AI API allows you to:
| Feature | Description | Documentation |
|---|---|---|
| Model Management | List and query available AI models across providers | Models |
| Chat Completions | Generate AI responses with multi-provider support | Chat Completions |
| Token Counting | Count tokens before making inference requests | Token Counting |
| Gateway Proxy | Direct access to provider APIs | Gateway |
Authentication
The AI API supports two authentication methods:
| Authentication Method | Description |
|---|---|
| Bearer Token | Session-based authentication using Authorization: Bearer <token> header |
| API Key | API key authentication using api-key and api-secret headers |
For details on how to obtain authentication credentials, see the Authentication section.
API Endpoints
The base URL for the AI API is:
https://apis.threatwinds.com/api/ai/v1
For detailed information about each endpoint, please refer to the specific documentation pages.
Supported Providers
The AI API aggregates models from 2 providers with 7 total models:
| Provider | Models | Capabilities |
|---|---|---|
| Claude (Anthropic) | Sonnet 4, Opus 4 (2 models) | Chat, tools-use, reasoning, code-generation |
| Groq | GPT OSS 20B/120B, Qwen 3 32B, LLaMA 4 Maverick/Scout (5 models) | Fast inference, chat, code-generation, tools-use, reasoning |
Model Summary
- Total Models: 7
- Claude Models: 2 (claude-sonnet-4, claude-opus-4)
- Groq Models: 5 (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout)
Common Use Cases
Simple Chat Completion
Generate AI responses from user messages:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Explain XDR"}]
}'
Count Tokens Before Request
Estimate costs by counting tokens:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Long message..."}]
}'
Note: Token counting is only supported for Claude models. Groq models will return a 400 error.
List Available Models
Discover all available models:
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models' \
-H 'Authorization: Bearer <token>'
Direct Provider Access
Use gateway for provider-specific features:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/gateway/claude/v1/messages' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{...provider-specific payload...}'
Error Response Headers
All error responses include the following custom headers:
| Header | Description |
|---|---|
| x-error | Human-readable error message describing what went wrong |
| x-error-id | Unique MD5 hash identifier for error tracking and support |
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Request successful with data (model list, chat response, token count) |
| 400 | Bad Request | Invalid parameters, validation error, empty messages, or malformed JSON |
| 401 | Unauthorized | Missing or invalid authentication credentials |
| 403 | Forbidden | Insufficient permissions for AI API access |
| 404 | Not Found | Model or provider not found |
| 500 | Internal Server Error | Provider error, AI service unavailable, or server-side error |
Model Capabilities
AI models expose various capabilities:
| Capability | Description |
|---|---|
| chat | Text-based conversation |
| text-generation | General text generation |
| code-generation | Code generation and completion |
| embeddings | Vector embeddings for semantic search |
| audio | Audio processing and transcription |
| image | Image understanding and generation |
| video | Video processing |
| tools-use | Function/tool calling |
| vision | Image understanding in chat |
| reasoning | Extended reasoning capabilities |
Token Limits
Each model has defined token limits:
- max_input_tokens: Maximum tokens in input messages
- max_completion_tokens: Maximum tokens the model can generate
- max_total_tokens: Maximum combined input + output tokens
Check model details to see specific limits for each model.
Response Formats
The AI API supports various response formats:
Text Response (Default)
Standard text completion response.
JSON Object
Structured JSON output:
{
"response_format": {
"type": "json_object"
}
}
JSON Schema
JSON output matching a specific schema:
{
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"answer": {"type": "string"}
}
}
}
}
Advanced Features
Reasoning Effort
Control AI reasoning depth (Claude models only):
| Value | Description | Token Budget (Claude) |
|---|---|---|
| auto | Automatic reasoning level (default, disables extended reasoning) | N/A |
| low | Low reasoning effort | 25% of max tokens |
| medium | Moderate reasoning | 33% of max tokens |
| high | Extended reasoning | 50% of max tokens |
Note: When reasoning is enabled (low/medium/high), temperature is automatically forced to 1.0.
Groq Limitation: qwen3-32b only accepts “none” or “default” for reasoning_effort parameter.
Service Tier
Control request priority (provider-specific):
| Value | Description |
|---|---|
| auto | Automatic tier selection (default) |
| default | Standard processing |
Note: Service tier support varies by provider and model. Check provider documentation for details.
Best Practices
Cost Optimization
- Count Tokens First: Use token counting endpoint before large requests
- Set Max Tokens: Always set
max_completion_tokensto control costs - Choose Appropriate Models: Use smaller models for simple tasks
- Monitor Usage: Track token usage via billing API limits
Performance
- Use Appropriate Providers:
- Groq for ultra-fast inference (optimized hardware)
- Claude for highest quality reasoning and complex problem-solving
- Minimize Context: Send only necessary message history
- Batch Requests: Process multiple items in parallel when possible
- Cache Results: Cache frequent queries to reduce API calls
Security
- Validate Input: Always validate user input before sending to AI
- Sanitize Output: Sanitize AI responses before displaying to users
- Monitor Usage: Track unusual patterns via logs
- Rotate Keys: Regularly rotate API keys and secrets
Error Handling
- Handle Provider Errors: Be prepared for provider-specific errors
- Implement Retries: Add exponential backoff for transient errors
- Check Token Limits: Validate against model token limits before requests
- Log Errors: Log all errors with interaction IDs for debugging