Chat Completions

Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses, with full OpenAI API compatibility including streaming.

Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions

Method: POST

Parameters

Headers

Header Type Required Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes Must be application/json

Note: You must use either Authorization header OR API key/secret combination.

Request Body

{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024,
  "temperature": 1.0,
  "stream": false
}

Request Parameters

Parameter Type Required Description
model string Yes Model ID to use for inference (see Models)
messages array Yes Message history for context (minimum 1)
max_completion_tokens integer No Maximum tokens in response (defaults to 50% of model max)
max_tokens integer No Alias for max_completion_tokens (OpenAI compatibility)
temperature float No Sampling temperature (0.0 to 2.0, default: 1.0)
top_p float No Nucleus sampling (0.0 to 1.0, default: 1.0)
n integer No Number of completions to generate (1-128, default: 1). Note: Claude only supports n=1
stop string/array No Stop sequences (max 4 sequences)
seed integer No Seed for deterministic sampling
frequency_penalty float No Frequency penalty (-2.0 to 2.0, default: 0)
presence_penalty float No Presence penalty (-2.0 to 2.0, default: 0)
logprobs boolean No Return log probabilities of output tokens
top_logprobs integer No Number of top logprobs per token (0-5, requires logprobs=true)
logit_bias object No Token bias map (token_id: -100 to 100)
user string No Unique user identifier for tracking
stream boolean No Enable SSE streaming (default: false)
stream_options object No Streaming configuration
tools array No Tool/function definitions (max 128)
tool_choice string/object No Tool selection strategy
parallel_tool_calls boolean No Allow parallel tool calls
reasoning_effort string No Reasoning effort level: low, medium, high (behavior when omitted varies by provider)
service_tier string No Service tier for priority routing (omit for default)
response_format object No Response format specification

Message Object

Field Type Required Description
role string Yes Message role: user, assistant, system, tool, developer
content string/array Yes Message content (string or array of content blocks)
name string No Participant name
tool_calls array No Tool calls generated by the model (for assistant messages)
tool_call_id string No Tool call identifier (for tool messages)
reasoning string No Reasoning text (for assistant messages)

Valid Message Roles

Role Description Usage
user User message Input from end user
assistant Assistant response Previous AI responses
system System instructions Behavioral instructions for AI
tool Tool/function response Results from tool calls
developer Developer message Developer-level instructions

Streaming (SSE)

Enable real-time streaming responses using Server-Sent Events (SSE). This is fully compatible with the OpenAI streaming format.

Enabling Streaming

Set stream: true in your request to receive incremental responses:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Write a haiku about cybersecurity"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Stream Options

Parameter Type Description
stream_options.include_usage boolean Include token usage in final chunk (default: false)

Response Format

Streaming responses are sent as Server-Sent Events. Each event contains a JSON object:

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":"Firewalls"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":" stand"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":20,"total_tokens":32}}

data: [DONE]

Chunk Structure

Field Description
object Always "chat.completion.chunk" for streaming
choices[].delta Incremental content (replaces message)
choices[].delta.role Present in first chunk only
choices[].delta.content Incremental text content
choices[].delta.tool_calls Incremental tool call data
choices[].finish_reason null until final content chunk, then stop, length, or tool_calls
usage Only present in final chunk when include_usage: true

Response Headers

When streaming is enabled, the server sets these headers:

Header Value
Content-Type text/event-stream
Cache-Control no-cache
Connection keep-alive
X-Accel-Buffering no

Stream Termination

The stream ends with data: [DONE] followed by two newlines. Always check for this marker to know when the stream is complete.


Function Calling (Tools)

Enable the model to call functions/tools defined by your application. This allows the model to request external data or perform actions.

Tool Parameters

Parameter Type Description
tools array Tool definitions (max 128 tools)
tool_choice string/object Tool selection strategy
parallel_tool_calls boolean Allow calling multiple tools in parallel

Tool Definition

Each tool is defined with a type and function specification:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_threat_intel",
        "description": "Get threat intelligence for an IP address or domain",
        "parameters": {
          "type": "object",
          "properties": {
            "indicator": {
              "type": "string",
              "description": "IP address or domain to lookup"
            },
            "indicator_type": {
              "type": "string",
              "enum": ["ip", "domain"],
              "description": "Type of indicator"
            }
          },
          "required": ["indicator", "indicator_type"]
        }
      }
    }
  ]
}

Tool Choice Options

Value Description
"none" Disable tool calling for this request
"auto" Model decides whether to call tools (default)
"required" Model must call at least one tool
{"type": "function", "function": {"name": "..."}} Force a specific tool

Example: Tool Call Request

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What threat intel do you have on 8.8.8.8?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_threat_intel",
          "description": "Get threat intelligence for an IP or domain",
          "parameters": {
            "type": "object",
            "properties": {
              "indicator": {"type": "string"},
              "indicator_type": {"type": "string", "enum": ["ip", "domain"]}
            },
            "required": ["indicator", "indicator_type"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Response with Tool Calls

When the model decides to call a tool, the response includes tool_calls:

{
  "id": "msg_123",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_threat_intel",
          "arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "usage": {"prompt_tokens": 50, "completion_tokens": 25, "total_tokens": 75}
}

Sending Tool Results

After executing the tool, send the result back with a tool role message:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What threat intel do you have on 8.8.8.8?"},
      {
        "role": "assistant",
        "tool_calls": [{
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_threat_intel",
            "arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
          }
        }]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "content": "{\"ip\":\"8.8.8.8\",\"owner\":\"Google\",\"reputation\":\"safe\",\"services\":[\"DNS\"]}"
      }
    ]
  }'

The model will then generate a natural language response based on the tool result.


Multimodal Content

Messages can contain multiple content types including text, images, audio, and video.

Content Array Format

Instead of a simple string, content can be an array of content blocks:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What vulnerabilities do you see in this network diagram?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png", "detail": "high"}}
  ]
}

Content Block Types

Type Description Required Capability
text Plain text content chat
image_url Image input image
audio_url Audio input audio
video_url Video input video

Image Detail Levels

Detail Description
auto Automatically select resolution (default)
low Lower resolution, faster processing
high Higher resolution, more detailed analysis

Supported Media Formats

Type Formats
Images JPEG, PNG, GIF, WebP
Audio MP3, WAV, OGG, FLAC
Video MP4, WebM, MOV

Example: Vision Request

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Analyze this security dashboard for anomalies"},
          {"type": "image_url", "image_url": {"url": "https://example.com/dashboard.png", "detail": "high"}}
        ]
      }
    ]
  }'

Note: Check model capabilities to ensure the model supports the content type you’re using. Use the Models endpoint to verify capabilities.


Reasoning

Enable extended thinking/reasoning for complex problems that benefit from step-by-step analysis.

Reasoning Effort Values

Value Description Token Budget (Claude)
low Low reasoning effort 25% of max tokens
medium Moderate reasoning 33% of max tokens
high Extended reasoning 50% of max tokens

Note: Only models with the reasoning capability support this parameter.

Provider-Specific Behavior

The behavior when reasoning_effort is omitted differs by provider:

Provider Behavior When Omitted
Claude Reasoning is explicitly disabled
Groq Uses model’s default behavior (may include reasoning)
vLLM Uses model’s default behavior

Special Notes

  • Claude models: Temperature is forced to 1.0 when reasoning is enabled
  • Groq models: Some models may return reasoning chains even without explicit reasoning_effort parameter
  • Groq qwen3-32b: reasoning_effort value is always treated as “default” internally
  • Minimum budget: Claude API requires minimum 1024 reasoning tokens

Example: Extended Reasoning

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {
        "role": "user",
        "content": "Design a secure architecture for a microservices-based XDR platform"
      }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 4000
  }'

When reasoning is enabled, the response includes a reasoning field:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Here is the recommended architecture...",
      "reasoning": "First, I need to consider the key security requirements for an XDR platform..."
    }
  }]
}

Service Tier

Control request priority and routing:

Value Description
auto Automatic tier selection (default)
default Standard processing
flex Flexible scheduling (may have higher latency)
priority Priority processing

Note: Service tier support varies by provider and model. Omit the parameter to use the default tier.


Response Format

Control the structure of the model’s output:

JSON Object

Request JSON-formatted output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

Request output matching a specific schema:

{
  "response_format": {
    "type": "json_object",
    "json_schema": {
      "type": "object",
      "properties": {
        "threat_name": {"type": "string"},
        "severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
        "indicators": {"type": "array", "items": {"type": "string"}}
      },
      "required": ["threat_name", "severity"]
    }
  }
}

Request

To generate a chat completion, use a POST request:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024
}'

Or using API key and secret:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'api-secret: your-api-secret' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-oss-20b",
  "messages": [
    {
      "role": "user",
      "content": "Explain threat intelligence"
    }
  ]
}'

Response

A successful response will return the AI-generated completion with usage statistics.

Success Response (200 OK)

{
  "id": "msg_01AbCdEf123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "XDR (Extended Detection and Response) is a unified security platform that..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 150,
    "total_tokens": 162
  }
}

Response Schema

Field Type Description
id string Unique interaction identifier
object string Response type: "chat.completion" or "chat.completion.chunk"
created integer Unix timestamp of creation
model string Model used for inference
system_fingerprint string Backend configuration identifier (optional)
service_tier string Service tier used for request (optional)
choices array Generated response choices
choices[].index integer Choice index in array
choices[].finish_reason string How generation ended
choices[].message object Generated message (non-streaming)
choices[].delta object Incremental message (streaming)
choices[].logprobs object Log probabilities (when requested)
usage object Token usage statistics
usage.prompt_tokens integer Tokens in input messages
usage.completion_tokens integer Tokens in generated response
usage.total_tokens integer Sum of prompt and completion tokens
usage.prompt_tokens_details object Breakdown of prompt tokens (optional)
usage.completion_tokens_details object Breakdown of completion tokens (optional)

Message Object (Response)

Field Type Description
role string Always “assistant”
content string Generated response text
tool_calls array Tool calls requested by model (optional)
reasoning string Reasoning process (when reasoning enabled)
refusal string Model’s refusal message (optional)

Finish Reason Values

Reason Description
stop Model completed response naturally
length Response cut off due to max_completion_tokens limit
content_filter Content filtered by provider safety systems
tool_calls Model requested tool/function calls

OpenAI Compatibility

This endpoint is designed for compatibility with OpenAI’s Chat Completions API.

Provider-Specific Parameter Support

Not all parameters are supported by all providers:

Parameter Claude Groq vLLM
temperature Yes Yes Yes
top_p Yes* Yes Yes
n Only n=1 Yes Yes
stop Yes Yes Yes
seed No Yes Yes
frequency_penalty No Yes Yes
presence_penalty No Yes Yes
logprobs No Yes Yes
top_logprobs No Yes Yes
logit_bias No Yes Yes
tools Yes Yes Yes
parallel_tool_calls No Yes Yes
reasoning_effort Yes Yes** Yes***
response_format No Yes Yes
user No Yes Yes

* Claude ignores top_p when reasoning is enabled ** Groq uses model’s default reasoning behavior when parameter is omitted; qwen3-32b always uses “default” internally *** vLLM uses model’s default reasoning behavior when parameter is omitted

Ignored Parameters

The following parameters are accepted for SDK compatibility but are currently ignored:

Parameter Status
store Ignored
metadata Ignored
modalities Ignored
audio Ignored
prediction Ignored
web_search_options Ignored

Business Logic

Request Processing

  1. Validation: Validates parameters, message array, roles, and token limits
  2. Token Validation: Checks input tokens against model’s max_input_tokens limit
  3. Model Resolution: Looks up provider client based on model ID
  4. Parameter Normalization: Applies defaults for optional parameters
  5. Inference: Calls provider’s Message() or MessageStream() method

Token Limit Validation

Before sending to the provider, the API validates:

  • Input tokens do not exceed max_input_tokens
  • Estimated total (input + max_completion_tokens) does not exceed max_total_tokens

If limits are exceeded, a 400 error is returned with token breakdown.


Error Codes

Status Code Description Possible Cause
200 OK Request successful
400 Bad Request Invalid JSON, empty messages, invalid role, parameter out of range, token limit exceeded
401 Unauthorized Missing or invalid authentication
403 Forbidden Insufficient permissions or model not allowed
500 Internal Server Error Provider error, model unavailable

Examples

Example 1: Simple Question

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_completion_tokens": 100
  }'

Response:

{
  "id": "msg_123",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "2 + 2 = 4"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 5,
    "total_tokens": 13
  }
}

Example 2: Multi-turn Conversation

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a cybersecurity expert"},
      {"role": "user", "content": "What is a zero-day vulnerability?"},
      {"role": "assistant", "content": "A zero-day vulnerability is a software security flaw that is unknown to the vendor..."},
      {"role": "user", "content": "How can organizations protect against them?"}
    ],
    "max_completion_tokens": 500
  }'

Example 3: Streaming Response

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Explain the MITRE ATT&CK framework"}
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Example 4: JSON Response Format

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": "Extract the name and severity from: Critical vulnerability in Apache Log4j"
      }
    ],
    "response_format": {
      "type": "json_object",
      "json_schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "severity": {"type": "string"}
        }
      }
    }
  }'

Response:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
    },
    "finish_reason": "stop"
  }]
}

Example 5: Extended Reasoning

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {
        "role": "user",
        "content": "Design a zero-trust network architecture for a financial institution"
      }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 4000
  }'

Example 6: Vision (Image Analysis)

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What security issues do you see in this network topology?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/network.png", "detail": "high"}}
        ]
      }
    ]
  }'

Example 7: Function Calling

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Scan the domain threatwinds.com for vulnerabilities"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "vulnerability_scan",
          "description": "Perform a vulnerability scan on a domain",
          "parameters": {
            "type": "object",
            "properties": {
              "domain": {"type": "string", "description": "Domain to scan"},
              "scan_type": {"type": "string", "enum": ["quick", "full", "stealth"]}
            },
            "required": ["domain"]
          }
        }
      }
    ]
  }'

Best Practices

Message Construction

  1. System Messages: Use system messages to set behavior and context
  2. Message History: Include relevant conversation history for context
  3. Clear Prompts: Be specific and clear in user messages
  4. Role Consistency: Maintain proper role alternation

Token Management

  1. Set Max Tokens: Always set max_completion_tokens to control costs
  2. Count First: Use token counting endpoint for large requests
  3. Monitor Usage: Track token usage via response usage field
  4. Truncate History: Remove old messages to stay within limits

Streaming

  1. Use for Long Responses: Enable streaming for better UX on long responses
  2. Handle Chunks: Process each chunk as it arrives
  3. Check for [DONE]: Always check for the [DONE] marker
  4. Include Usage: Set include_usage: true for token tracking

Error Handling

  1. Validate Before Send: Check message count and roles before sending
  2. Handle Provider Errors: Be prepared for provider-specific errors
  3. Implement Retries: Add exponential backoff for transient errors
  4. Log Interaction IDs: Save interaction IDs for debugging

Performance

  1. Choose Appropriate Models: Use faster models for simple tasks
  2. Minimize Context: Send only necessary message history
  3. Parallel Requests: Process multiple items concurrently when possible
  4. Cache Results: Cache frequent queries to reduce API calls

Security

  1. Validate Input: Sanitize user input before sending to AI
  2. Filter Output: Validate and sanitize AI responses
  3. Monitor Usage: Track unusual patterns via logs
  4. Rotate Keys: Regularly rotate API credentials

Cost Optimization

  1. Model Selection: Use smaller/faster models when quality permits
  2. Token Limits: Set appropriate max_completion_tokens
  3. Batch Processing: Group similar requests
  4. Response Caching: Cache responses for repeated queries