Chat Completions
Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses.
Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be application/json |
Note: You must use either Authorization header OR API key/secret combination.
Request Body
{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024,
"temperature": 1.0,
"reasoning_effort": "medium",
"service_tier": "auto",
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"answer": {
"type": "string"
}
}
}
}
}
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for inference (see Models) |
| messages | array | Yes | Message history for context (minimum 1) |
| max_completion_tokens | integer | No | Maximum tokens in response (defaults to 50% of model max, capped at model limit) |
| temperature | float | No | Sampling temperature (0.0 to 2.0, default: 1.0, forced to 1.0 when reasoning enabled) |
| reasoning_effort | string | No | Reasoning effort level (default: “auto” which disables reasoning) |
| service_tier | string | No | Service tier for priority routing (default: “auto”) |
| response_format | object | No | Response format specification (json_schema must be object, not string) |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
| role | string | Yes | Message role: user, assistant, system, tool, developer |
| content | string | Yes | Message text content |
| tool_call_id | string | No | Tool call identifier (for tool messages) |
| reasoning | string | No | Reasoning text (for assistant messages) |
Valid Message Roles
| Role | Description | Usage |
|---|---|---|
| user | User message | Input from end user |
| assistant | Assistant response | Previous AI responses |
| system | System instructions | Behavioral instructions for AI |
| tool | Tool/function response | Results from tool calls |
| developer | Developer message | Developer-level instructions |
Reasoning Effort Values
| Value | Description | Token Budget (Claude) |
|---|---|---|
| auto | Automatic reasoning level (default, disables reasoning) | N/A |
| low | Low reasoning effort | 25% of max tokens |
| medium | Moderate reasoning | 33% of max tokens |
| high | Extended reasoning | 50% of max tokens |
| minimal | Very minimal reasoning | 20% of max tokens |
| disabled | Explicitly disabled | N/A |
Special Notes:
- For Claude models: Temperature is forced to 1.0 when reasoning is enabled
- For Groq qwen3-32b: Only accepts “none” or “default” for reasoning_effort
- Reasoning budget has minimum of 1024 tokens (Claude API requirement)
Service Tier Values
| Value | Description |
|---|---|
| auto | Automatic tier selection (default) |
| default | Standard processing |
| flex | Flexible scheduling |
| priority | Priority processing |
| standard | Standard tier |
Request
To generate a chat completion, use a POST request:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024
}'
Or using API key and secret:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'api-secret: your-api-secret' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-oss-20b",
"messages": [
{
"role": "user",
"content": "Explain threat intelligence"
}
]
}'
Response
A successful response will return the AI-generated completion with usage statistics.
Success Response (200 OK)
{
"id": "msg_01AbCdEf123",
"object": "chat.completion",
"created": 1704067200,
"model": "claude-sonnet-4",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "XDR (Extended Detection and Response) is a unified security platform that...",
"reasoning": "The user is asking about a cybersecurity concept. I should provide a clear, comprehensive explanation..."
}
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 150,
"total_tokens": 162
}
}
Response Schema
| Field | Type | Description |
|---|---|---|
| id | string (UUID) | Unique interaction identifier |
| object | string | Response type, always “chat.completion” |
| created | integer | Unix timestamp of creation |
| model | string | Model used for inference |
| choices | array | Generated response choices (typically 1) |
| choices[].index | integer | Choice index in array |
| choices[].finish_reason | string | How generation ended |
| choices[].message | object | Generated message |
| choices[].message.role | string | Always “assistant” |
| choices[].message.content | string | Generated response text |
| choices[].message.reasoning | string | Reasoning process (optional, when reasoning enabled) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Tokens in input messages |
| usage.completion_tokens | integer | Tokens in generated response |
| usage.total_tokens | integer | Sum of prompt and completion tokens |
Finish Reason Values
| Reason | Description |
|---|---|
| stop | Model completed response naturally |
| length | Response cut off due to max_completion_tokens limit |
| content_filter | Content filtered by provider safety systems |
| tool_calls | Model requested tool/function calls |
Business Logic
Request Processing
- Validation: Validates message array is not empty and all roles are valid
- Model Resolution: Looks up provider client based on model ID
- Parameter Normalization: Defaults reasoning_effort and service_tier to “auto” if invalid
- Inference: Calls provider client’s
Message()method
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid JSON, zero messages, invalid role, invalid json_schema |
| 401 | Unauthorized | Missing or invalid authentication |
| 403 | Forbidden | Insufficient permissions |
| 500 | Internal Server Error | Provider error, model unavailable |
Examples
Example 1: Simple Question
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"max_completion_tokens": 100
}'
Response:
{
"id": "...",
"choices": [{
"message": {
"role": "assistant",
"content": "2 + 2 = 4"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 5,
"total_tokens": 13
}
}
Example 2: Multi-turn Conversation
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a cybersecurity expert"},
{"role": "user", "content": "What is a zero-day vulnerability?"},
{"role": "assistant", "content": "A zero-day vulnerability is..."},
{"role": "user", "content": "How can organizations protect against them?"}
],
"max_completion_tokens": 500
}'
Example 3: JSON Response Format
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-3-5-sonnet-20241022",
"messages": [
{
"role": "user",
"content": "Extract the name and severity from: Critical vulnerability in Apache Log4j"
}
],
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"severity": {"type": "string"}
}
}
}
}'
Response:
{
"choices": [{
"message": {
"content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
}
}]
}
Example 4: Extended Reasoning
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "o1-preview",
"messages": [
{
"role": "user",
"content": "Design a secure architecture for a microservices-based XDR platform"
}
],
"reasoning_effort": "high",
"max_completion_tokens": 4000
}'
Example 5: Vision (Image Understanding)
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What security issues do you see?"},
{"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
]
}
]
}'
Best Practices
Message Construction
- System Messages: Use system messages to set behavior and context
- Message History: Include relevant conversation history for context
- Clear Prompts: Be specific and clear in user messages
- Role Consistency: Maintain proper role alternation
Token Management
- Set Max Tokens: Always set
max_completion_tokensto control costs - Count First: Use token counting endpoint for large requests
- Monitor Usage: Track token usage via response
usagefield - Truncate History: Remove old messages to stay within limits
Error Handling
- Validate Before Send: Check message count and roles before sending
- Handle Provider Errors: Be prepared for provider-specific errors
- Implement Retries: Add exponential backoff for transient errors
- Log Interaction IDs: Save interaction IDs for debugging
Performance
- Choose Appropriate Models: Use faster models for simple tasks
- Minimize Context: Send only necessary message history
- Parallel Requests: Process multiple items concurrently when possible
- Cache Results: Cache frequent queries to reduce API calls
Security
- Validate Input: Sanitize user input before sending to AI
- Filter Output: Validate and sanitize AI responses
- Monitor Usage: Track unusual patterns via logs
- Rotate Keys: Regularly rotate API credentials
Cost Optimization
- Model Selection: Use smaller/faster models when quality permits
- Token Limits: Set appropriate max_completion_tokens
- Batch Processing: Group similar requests
- Response Caching: Cache responses for repeated queries