Chat Completions

Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses, with full OpenAI API compatibility including streaming.

Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions

Method: POST

Parameters

Headers

Header	Type	Required	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication
Content-Type	string	Yes	Must be `application/json`

Note: You must use either Authorization header OR API key/secret combination.

Request Body

{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024,
  "temperature": 1.0,
  "stream": false
}

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	Model ID to use for inference (see Models)
messages	array	Yes	Message history for context (minimum 1)
max_completion_tokens	integer	No	Maximum tokens in response (defaults to 50% of model max)
max_tokens	integer	No	Alias for max_completion_tokens (OpenAI compatibility)
temperature	float	No	Sampling temperature (0.0 to 2.0, default: 1.0)
top_p	float	No	Nucleus sampling (0.0 to 1.0, default: 1.0)
n	integer	No	Number of completions to generate (1-128, default: 1). Note: Claude only supports n=1
stop	string/array	No	Stop sequences (max 4 sequences)
seed	integer	No	Seed for deterministic sampling
frequency_penalty	float	No	Frequency penalty (-2.0 to 2.0, default: 0)
presence_penalty	float	No	Presence penalty (-2.0 to 2.0, default: 0)
logprobs	boolean	No	Return log probabilities of output tokens
top_logprobs	integer	No	Number of top logprobs per token (0-5, requires logprobs=true)
logit_bias	object	No	Token bias map (token_id: -100 to 100)
user	string	No	Unique user identifier for tracking
stream	boolean	No	Enable SSE streaming (default: false)
stream_options	object	No	Streaming configuration
tools	array	No	Tool/function definitions (max 128)
tool_choice	string/object	No	Tool selection strategy
parallel_tool_calls	boolean	No	Allow parallel tool calls
reasoning_effort	string	No	Reasoning effort level: low, medium, high (behavior when omitted varies by provider)
service_tier	string	No	Service tier for priority routing (omit for default)
response_format	object	No	Response format specification

Message Object

Field	Type	Required	Description
role	string	Yes	Message role: user, assistant, system, tool, developer
content	string/array	Yes	Message content (string or array of content blocks)
name	string	No	Participant name
tool_calls	array	No	Tool calls generated by the model (for assistant messages)
tool_call_id	string	No	Tool call identifier (for tool messages)
reasoning	string	No	Reasoning text (for assistant messages)

Valid Message Roles

Role	Description	Usage
user	User message	Input from end user
assistant	Assistant response	Previous AI responses
system	System instructions	Behavioral instructions for AI
tool	Tool/function response	Results from tool calls
developer	Developer message	Developer-level instructions

Streaming (SSE)

Enable real-time streaming responses using Server-Sent Events (SSE). This is fully compatible with the OpenAI streaming format.

Enabling Streaming

Set stream: true in your request to receive incremental responses:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Write a haiku about cybersecurity"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Stream Options

Parameter	Type	Description
stream_options.include_usage	boolean	Include token usage in final chunk (default: false)

Response Format

Streaming responses are sent as Server-Sent Events. Each event contains a JSON object:

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":"Firewalls"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":" stand"},"finish_reason":null}]}

data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":20,"total_tokens":32}}

data: [DONE]

Chunk Structure

Field	Description
object	Always `"chat.completion.chunk"` for streaming
choices[].delta	Incremental content (replaces `message`)
choices[].delta.role	Present in first chunk only
choices[].delta.content	Incremental text content
choices[].delta.tool_calls	Incremental tool call data
choices[].finish_reason	`null` until final content chunk, then `stop`, `length`, or `tool_calls`
usage	Only present in final chunk when `include_usage: true`

Response Headers

When streaming is enabled, the server sets these headers:

Header	Value
Content-Type	`text/event-stream`
Cache-Control	`no-cache`
Connection	`keep-alive`
X-Accel-Buffering	`no`

Stream Termination

The stream ends with data: [DONE] followed by two newlines. Always check for this marker to know when the stream is complete.

Function Calling (Tools)

Enable the model to call functions/tools defined by your application. This allows the model to request external data or perform actions.

Tool Parameters

Parameter	Type	Description
tools	array	Tool definitions (max 128 tools)
tool_choice	string/object	Tool selection strategy
parallel_tool_calls	boolean	Allow calling multiple tools in parallel

Tool Definition

Each tool is defined with a type and function specification:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_threat_intel",
        "description": "Get threat intelligence for an IP address or domain",
        "parameters": {
          "type": "object",
          "properties": {
            "indicator": {
              "type": "string",
              "description": "IP address or domain to lookup"
            },
            "indicator_type": {
              "type": "string",
              "enum": ["ip", "domain"],
              "description": "Type of indicator"
            }
          },
          "required": ["indicator", "indicator_type"]
        }
      }
    }
  ]
}

Tool Choice Options

Value	Description
`"none"`	Disable tool calling for this request
`"auto"`	Model decides whether to call tools (default)
`"required"`	Model must call at least one tool
`{"type": "function", "function": {"name": "..."}}`	Force a specific tool

Example: Tool Call Request

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What threat intel do you have on 8.8.8.8?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_threat_intel",
          "description": "Get threat intelligence for an IP or domain",
          "parameters": {
            "type": "object",
            "properties": {
              "indicator": {"type": "string"},
              "indicator_type": {"type": "string", "enum": ["ip", "domain"]}
            },
            "required": ["indicator", "indicator_type"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Response with Tool Calls

When the model decides to call a tool, the response includes tool_calls:

{
  "id": "msg_123",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_threat_intel",
          "arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "usage": {"prompt_tokens": 50, "completion_tokens": 25, "total_tokens": 75}
}

Sending Tool Results

After executing the tool, send the result back with a tool role message:

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What threat intel do you have on 8.8.8.8?"},
      {
        "role": "assistant",
        "tool_calls": [{
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_threat_intel",
            "arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
          }
        }]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "content": "{\"ip\":\"8.8.8.8\",\"owner\":\"Google\",\"reputation\":\"safe\",\"services\":[\"DNS\"]}"
      }
    ]
  }'

The model will then generate a natural language response based on the tool result.

Multimodal Content

Messages can contain multiple content types including text, images, audio, and video.

Content Array Format

Instead of a simple string, content can be an array of content blocks:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What vulnerabilities do you see in this network diagram?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png", "detail": "high"}}
  ]
}

Content Block Types

Type	Description	Required Capability
`text`	Plain text content	chat
`image_url`	Image input	image
`audio_url`	Audio input	audio
`video_url`	Video input	video

Image Detail Levels

Detail	Description
`auto`	Automatically select resolution (default)
`low`	Lower resolution, faster processing
`high`	Higher resolution, more detailed analysis

Supported Media Formats

Type	Formats
Images	JPEG, PNG, GIF, WebP
Audio	MP3, WAV, OGG, FLAC
Video	MP4, WebM, MOV

Example: Vision Request

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Analyze this security dashboard for anomalies"},
          {"type": "image_url", "image_url": {"url": "https://example.com/dashboard.png", "detail": "high"}}
        ]
      }
    ]
  }'

Note: Check model capabilities to ensure the model supports the content type you’re using. Use the Models endpoint to verify capabilities.

Reasoning

Enable extended thinking/reasoning for complex problems that benefit from step-by-step analysis.

Reasoning Effort Values

Value	Description	Token Budget (Claude)
low	Low reasoning effort	25% of max tokens
medium	Moderate reasoning	33% of max tokens
high	Extended reasoning	50% of max tokens

Note: Only models with the reasoning capability support this parameter.

Provider-Specific Behavior

The behavior when reasoning_effort is omitted differs by provider:

Provider	Behavior When Omitted
Claude	Reasoning is explicitly disabled
Groq	Uses model’s default behavior (may include reasoning)
vLLM	Uses model’s default behavior

Special Notes

Claude models: Temperature is forced to 1.0 when reasoning is enabled
Groq models: Some models may return reasoning chains even without explicit reasoning_effort parameter
Groq qwen3-32b: reasoning_effort value is always treated as “default” internally
Minimum budget: Claude API requires minimum 1024 reasoning tokens

Example: Extended Reasoning

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {
        "role": "user",
        "content": "Design a secure architecture for a microservices-based XDR platform"
      }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 4000
  }'

When reasoning is enabled, the response includes a reasoning field:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Here is the recommended architecture...",
      "reasoning": "First, I need to consider the key security requirements for an XDR platform..."
    }
  }]
}

Service Tier

Control request priority and routing:

Value	Description
auto	Automatic tier selection (default)
default	Standard processing
flex	Flexible scheduling (may have higher latency)
priority	Priority processing

Note: Service tier support varies by provider and model. Omit the parameter to use the default tier.

Response Format

Control the structure of the model’s output:

JSON Object

Request JSON-formatted output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

Request output matching a specific schema:

{
  "response_format": {
    "type": "json_object",
    "json_schema": {
      "type": "object",
      "properties": {
        "threat_name": {"type": "string"},
        "severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
        "indicators": {"type": "array", "items": {"type": "string"}}
      },
      "required": ["threat_name", "severity"]
    }
  }
}

Request

To generate a chat completion, use a POST request:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024
}'

Or using API key and secret:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'api-secret: your-api-secret' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-oss-20b",
  "messages": [
    {
      "role": "user",
      "content": "Explain threat intelligence"
    }
  ]
}'

Response

A successful response will return the AI-generated completion with usage statistics.

Success Response (200 OK)

{
  "id": "msg_01AbCdEf123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "XDR (Extended Detection and Response) is a unified security platform that..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 150,
    "total_tokens": 162
  }
}

Response Schema

Field	Type	Description
id	string	Unique interaction identifier
object	string	Response type: `"chat.completion"` or `"chat.completion.chunk"`
created	integer	Unix timestamp of creation
model	string	Model used for inference
system_fingerprint	string	Backend configuration identifier (optional)
service_tier	string	Service tier used for request (optional)
choices	array	Generated response choices
choices[].index	integer	Choice index in array
choices[].finish_reason	string	How generation ended
choices[].message	object	Generated message (non-streaming)
choices[].delta	object	Incremental message (streaming)
choices[].logprobs	object	Log probabilities (when requested)
usage	object	Token usage statistics
usage.prompt_tokens	integer	Tokens in input messages
usage.completion_tokens	integer	Tokens in generated response
usage.total_tokens	integer	Sum of prompt and completion tokens
usage.prompt_tokens_details	object	Breakdown of prompt tokens (optional)
usage.completion_tokens_details	object	Breakdown of completion tokens (optional)

Message Object (Response)

Field	Type	Description
role	string	Always “assistant”
content	string	Generated response text
tool_calls	array	Tool calls requested by model (optional)
reasoning	string	Reasoning process (when reasoning enabled)
refusal	string	Model’s refusal message (optional)

Finish Reason Values

Reason	Description
stop	Model completed response naturally
length	Response cut off due to max_completion_tokens limit
content_filter	Content filtered by provider safety systems
tool_calls	Model requested tool/function calls

OpenAI Compatibility

This endpoint is designed for compatibility with OpenAI’s Chat Completions API.

Provider-Specific Parameter Support

Not all parameters are supported by all providers:

Parameter	Claude	Groq	vLLM
temperature	Yes	Yes	Yes
top_p	Yes*	Yes	Yes
n	Only n=1	Yes	Yes
stop	Yes	Yes	Yes
seed	No	Yes	Yes
frequency_penalty	No	Yes	Yes
presence_penalty	No	Yes	Yes
logprobs	No	Yes	Yes
top_logprobs	No	Yes	Yes
logit_bias	No	Yes	Yes
tools	Yes	Yes	Yes
parallel_tool_calls	No	Yes	Yes
reasoning_effort	Yes	Yes**	Yes***
response_format	No	Yes	Yes
user	No	Yes	Yes

* Claude ignores top_p when reasoning is enabled ** Groq uses model’s default reasoning behavior when parameter is omitted; qwen3-32b always uses “default” internally *** vLLM uses model’s default reasoning behavior when parameter is omitted

Ignored Parameters

The following parameters are accepted for SDK compatibility but are currently ignored:

Parameter	Status
store	Ignored
metadata	Ignored
modalities	Ignored
audio	Ignored
prediction	Ignored
web_search_options	Ignored

Business Logic

Request Processing

Validation: Validates parameters, message array, roles, and token limits
Token Validation: Checks input tokens against model’s max_input_tokens limit
Model Resolution: Looks up provider client based on model ID
Parameter Normalization: Applies defaults for optional parameters
Inference: Calls provider’s Message() or MessageStream() method

Token Limit Validation

Before sending to the provider, the API validates:

Input tokens do not exceed max_input_tokens
Estimated total (input + max_completion_tokens) does not exceed max_total_tokens

If limits are exceeded, a 400 error is returned with token breakdown.

Error Codes

Status Code	Description	Possible Cause
200	OK	Request successful
400	Bad Request	Invalid JSON, empty messages, invalid role, parameter out of range, token limit exceeded
401	Unauthorized	Missing or invalid authentication
403	Forbidden	Insufficient permissions or model not allowed
500	Internal Server Error	Provider error, model unavailable

Examples

Example 1: Simple Question

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_completion_tokens": 100
  }'

Response:

{
  "id": "msg_123",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "2 + 2 = 4"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 5,
    "total_tokens": 13
  }
}

Example 2: Multi-turn Conversation

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a cybersecurity expert"},
      {"role": "user", "content": "What is a zero-day vulnerability?"},
      {"role": "assistant", "content": "A zero-day vulnerability is a software security flaw that is unknown to the vendor..."},
      {"role": "user", "content": "How can organizations protect against them?"}
    ],
    "max_completion_tokens": 500
  }'

Example 3: Streaming Response

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Explain the MITRE ATT&CK framework"}
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Example 4: JSON Response Format

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": "Extract the name and severity from: Critical vulnerability in Apache Log4j"
      }
    ],
    "response_format": {
      "type": "json_object",
      "json_schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "severity": {"type": "string"}
        }
      }
    }
  }'

Response:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
    },
    "finish_reason": "stop"
  }]
}

Example 5: Extended Reasoning

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {
        "role": "user",
        "content": "Design a zero-trust network architecture for a financial institution"
      }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 4000
  }'

Example 6: Vision (Image Analysis)

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What security issues do you see in this network topology?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/network.png", "detail": "high"}}
        ]
      }
    ]
  }'

Example 7: Function Calling

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Scan the domain threatwinds.com for vulnerabilities"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "vulnerability_scan",
          "description": "Perform a vulnerability scan on a domain",
          "parameters": {
            "type": "object",
            "properties": {
              "domain": {"type": "string", "description": "Domain to scan"},
              "scan_type": {"type": "string", "enum": ["quick", "full", "stealth"]}
            },
            "required": ["domain"]
          }
        }
      }
    ]
  }'

Best Practices

Message Construction

System Messages: Use system messages to set behavior and context
Message History: Include relevant conversation history for context
Clear Prompts: Be specific and clear in user messages
Role Consistency: Maintain proper role alternation

Token Management

Set Max Tokens: Always set max_completion_tokens to control costs
Count First: Use token counting endpoint for large requests
Monitor Usage: Track token usage via response usage field
Truncate History: Remove old messages to stay within limits

Streaming

Use for Long Responses: Enable streaming for better UX on long responses
Handle Chunks: Process each chunk as it arrives
Check for [DONE]: Always check for the [DONE] marker
Include Usage: Set include_usage: true for token tracking

Error Handling

Validate Before Send: Check message count and roles before sending
Handle Provider Errors: Be prepared for provider-specific errors
Implement Retries: Add exponential backoff for transient errors
Log Interaction IDs: Save interaction IDs for debugging

Performance

Choose Appropriate Models: Use faster models for simple tasks
Minimize Context: Send only necessary message history
Parallel Requests: Process multiple items concurrently when possible
Cache Results: Cache frequent queries to reduce API calls

Security

Validate Input: Sanitize user input before sending to AI
Filter Output: Validate and sanitize AI responses
Monitor Usage: Track unusual patterns via logs
Rotate Keys: Regularly rotate API credentials

Cost Optimization

Model Selection: Use smaller/faster models when quality permits
Token Limits: Set appropriate max_completion_tokens
Batch Processing: Group similar requests
Response Caching: Cache responses for repeated queries