Chat Completions
Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses, with full OpenAI API compatibility including streaming.
Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be application/json |
Note: You must use either Authorization header OR API key/secret combination.
Request Body
{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024,
"temperature": 1.0,
"stream": false
}
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for inference (see Models) |
| messages | array | Yes | Message history for context (minimum 1) |
| max_completion_tokens | integer | No | Maximum tokens in response (defaults to 50% of model max) |
| max_tokens | integer | No | Alias for max_completion_tokens (OpenAI compatibility) |
| temperature | float | No | Sampling temperature (0.0 to 2.0, default: 1.0) |
| top_p | float | No | Nucleus sampling (0.0 to 1.0, default: 1.0) |
| n | integer | No | Number of completions to generate (1-128, default: 1). Note: Claude only supports n=1 |
| stop | string/array | No | Stop sequences (max 4 sequences) |
| seed | integer | No | Seed for deterministic sampling |
| frequency_penalty | float | No | Frequency penalty (-2.0 to 2.0, default: 0) |
| presence_penalty | float | No | Presence penalty (-2.0 to 2.0, default: 0) |
| logprobs | boolean | No | Return log probabilities of output tokens |
| top_logprobs | integer | No | Number of top logprobs per token (0-5, requires logprobs=true) |
| logit_bias | object | No | Token bias map (token_id: -100 to 100) |
| user | string | No | Unique user identifier for tracking |
| stream | boolean | No | Enable SSE streaming (default: false) |
| stream_options | object | No | Streaming configuration |
| tools | array | No | Tool/function definitions (max 128) |
| tool_choice | string/object | No | Tool selection strategy |
| parallel_tool_calls | boolean | No | Allow parallel tool calls |
| reasoning_effort | string | No | Reasoning effort level: low, medium, high (behavior when omitted varies by provider) |
| service_tier | string | No | Service tier for priority routing (omit for default) |
| response_format | object | No | Response format specification |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
| role | string | Yes | Message role: user, assistant, system, tool, developer |
| content | string/array | Yes | Message content (string or array of content blocks) |
| name | string | No | Participant name |
| tool_calls | array | No | Tool calls generated by the model (for assistant messages) |
| tool_call_id | string | No | Tool call identifier (for tool messages) |
| reasoning | string | No | Reasoning text (for assistant messages) |
Valid Message Roles
| Role | Description | Usage |
|---|---|---|
| user | User message | Input from end user |
| assistant | Assistant response | Previous AI responses |
| system | System instructions | Behavioral instructions for AI |
| tool | Tool/function response | Results from tool calls |
| developer | Developer message | Developer-level instructions |
Streaming (SSE)
Enable real-time streaming responses using Server-Sent Events (SSE). This is fully compatible with the OpenAI streaming format.
Enabling Streaming
Set stream: true in your request to receive incremental responses:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Write a haiku about cybersecurity"}],
"stream": true,
"stream_options": {"include_usage": true}
}'
Stream Options
| Parameter | Type | Description |
|---|---|---|
| stream_options.include_usage | boolean | Include token usage in final chunk (default: false) |
Response Format
Streaming responses are sent as Server-Sent Events. Each event contains a JSON object:
data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":"Firewalls"},"finish_reason":null}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{"content":" stand"},"finish_reason":null}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1704067200,"model":"claude-sonnet-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":20,"total_tokens":32}}
data: [DONE]
Chunk Structure
| Field | Description |
|---|---|
| object | Always "chat.completion.chunk" for streaming |
| choices[].delta | Incremental content (replaces message) |
| choices[].delta.role | Present in first chunk only |
| choices[].delta.content | Incremental text content |
| choices[].delta.tool_calls | Incremental tool call data |
| choices[].finish_reason | null until final content chunk, then stop, length, or tool_calls |
| usage | Only present in final chunk when include_usage: true |
Response Headers
When streaming is enabled, the server sets these headers:
| Header | Value |
|---|---|
| Content-Type | text/event-stream |
| Cache-Control | no-cache |
| Connection | keep-alive |
| X-Accel-Buffering | no |
Stream Termination
The stream ends with data: [DONE] followed by two newlines. Always check for this marker to know when the stream is complete.
Function Calling (Tools)
Enable the model to call functions/tools defined by your application. This allows the model to request external data or perform actions.
Tool Parameters
| Parameter | Type | Description |
|---|---|---|
| tools | array | Tool definitions (max 128 tools) |
| tool_choice | string/object | Tool selection strategy |
| parallel_tool_calls | boolean | Allow calling multiple tools in parallel |
Tool Definition
Each tool is defined with a type and function specification:
{
"tools": [
{
"type": "function",
"function": {
"name": "get_threat_intel",
"description": "Get threat intelligence for an IP address or domain",
"parameters": {
"type": "object",
"properties": {
"indicator": {
"type": "string",
"description": "IP address or domain to lookup"
},
"indicator_type": {
"type": "string",
"enum": ["ip", "domain"],
"description": "Type of indicator"
}
},
"required": ["indicator", "indicator_type"]
}
}
}
]
}
Tool Choice Options
| Value | Description |
|---|---|
"none" | Disable tool calling for this request |
"auto" | Model decides whether to call tools (default) |
"required" | Model must call at least one tool |
{"type": "function", "function": {"name": "..."}} | Force a specific tool |
Example: Tool Call Request
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "What threat intel do you have on 8.8.8.8?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_threat_intel",
"description": "Get threat intelligence for an IP or domain",
"parameters": {
"type": "object",
"properties": {
"indicator": {"type": "string"},
"indicator_type": {"type": "string", "enum": ["ip", "domain"]}
},
"required": ["indicator", "indicator_type"]
}
}
}
],
"tool_choice": "auto"
}'
Response with Tool Calls
When the model decides to call a tool, the response includes tool_calls:
{
"id": "msg_123",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_threat_intel",
"arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
}
}]
},
"finish_reason": "tool_calls"
}],
"usage": {"prompt_tokens": 50, "completion_tokens": 25, "total_tokens": 75}
}
Sending Tool Results
After executing the tool, send the result back with a tool role message:
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "What threat intel do you have on 8.8.8.8?"},
{
"role": "assistant",
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_threat_intel",
"arguments": "{\"indicator\":\"8.8.8.8\",\"indicator_type\":\"ip\"}"
}
}]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"ip\":\"8.8.8.8\",\"owner\":\"Google\",\"reputation\":\"safe\",\"services\":[\"DNS\"]}"
}
]
}'
The model will then generate a natural language response based on the tool result.
Multimodal Content
Messages can contain multiple content types including text, images, audio, and video.
Content Array Format
Instead of a simple string, content can be an array of content blocks:
{
"role": "user",
"content": [
{"type": "text", "text": "What vulnerabilities do you see in this network diagram?"},
{"type": "image_url", "image_url": {"url": "https://example.com/diagram.png", "detail": "high"}}
]
}
Content Block Types
| Type | Description | Required Capability |
|---|---|---|
text | Plain text content | chat |
image_url | Image input | image |
audio_url | Audio input | audio |
video_url | Video input | video |
Image Detail Levels
| Detail | Description |
|---|---|
auto | Automatically select resolution (default) |
low | Lower resolution, faster processing |
high | Higher resolution, more detailed analysis |
Supported Media Formats
| Type | Formats |
|---|---|
| Images | JPEG, PNG, GIF, WebP |
| Audio | MP3, WAV, OGG, FLAC |
| Video | MP4, WebM, MOV |
Example: Vision Request
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this security dashboard for anomalies"},
{"type": "image_url", "image_url": {"url": "https://example.com/dashboard.png", "detail": "high"}}
]
}
]
}'
Note: Check model capabilities to ensure the model supports the content type you’re using. Use the Models endpoint to verify capabilities.
Reasoning
Enable extended thinking/reasoning for complex problems that benefit from step-by-step analysis.
Reasoning Effort Values
| Value | Description | Token Budget (Claude) |
|---|---|---|
| low | Low reasoning effort | 25% of max tokens |
| medium | Moderate reasoning | 33% of max tokens |
| high | Extended reasoning | 50% of max tokens |
Note: Only models with the
reasoningcapability support this parameter.
Provider-Specific Behavior
The behavior when reasoning_effort is omitted differs by provider:
| Provider | Behavior When Omitted |
|---|---|
| Claude | Reasoning is explicitly disabled |
| Groq | Uses model’s default behavior (may include reasoning) |
| vLLM | Uses model’s default behavior |
Special Notes
- Claude models: Temperature is forced to 1.0 when reasoning is enabled
- Groq models: Some models may return reasoning chains even without explicit reasoning_effort parameter
- Groq qwen3-32b: reasoning_effort value is always treated as “default” internally
- Minimum budget: Claude API requires minimum 1024 reasoning tokens
Example: Extended Reasoning
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-opus-4",
"messages": [
{
"role": "user",
"content": "Design a secure architecture for a microservices-based XDR platform"
}
],
"reasoning_effort": "high",
"max_completion_tokens": 4000
}'
When reasoning is enabled, the response includes a reasoning field:
{
"choices": [{
"message": {
"role": "assistant",
"content": "Here is the recommended architecture...",
"reasoning": "First, I need to consider the key security requirements for an XDR platform..."
}
}]
}
Service Tier
Control request priority and routing:
| Value | Description |
|---|---|
| auto | Automatic tier selection (default) |
| default | Standard processing |
| flex | Flexible scheduling (may have higher latency) |
| priority | Priority processing |
Note: Service tier support varies by provider and model. Omit the parameter to use the default tier.
Response Format
Control the structure of the model’s output:
JSON Object
Request JSON-formatted output:
{
"response_format": {
"type": "json_object"
}
}
JSON Schema
Request output matching a specific schema:
{
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"threat_name": {"type": "string"},
"severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
"indicators": {"type": "array", "items": {"type": "string"}}
},
"required": ["threat_name", "severity"]
}
}
}
Request
To generate a chat completion, use a POST request:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024
}'
Or using API key and secret:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'api-secret: your-api-secret' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-oss-20b",
"messages": [
{
"role": "user",
"content": "Explain threat intelligence"
}
]
}'
Response
A successful response will return the AI-generated completion with usage statistics.
Success Response (200 OK)
{
"id": "msg_01AbCdEf123",
"object": "chat.completion",
"created": 1704067200,
"model": "claude-sonnet-4",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "XDR (Extended Detection and Response) is a unified security platform that..."
}
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 150,
"total_tokens": 162
}
}
Response Schema
| Field | Type | Description |
|---|---|---|
| id | string | Unique interaction identifier |
| object | string | Response type: "chat.completion" or "chat.completion.chunk" |
| created | integer | Unix timestamp of creation |
| model | string | Model used for inference |
| system_fingerprint | string | Backend configuration identifier (optional) |
| service_tier | string | Service tier used for request (optional) |
| choices | array | Generated response choices |
| choices[].index | integer | Choice index in array |
| choices[].finish_reason | string | How generation ended |
| choices[].message | object | Generated message (non-streaming) |
| choices[].delta | object | Incremental message (streaming) |
| choices[].logprobs | object | Log probabilities (when requested) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Tokens in input messages |
| usage.completion_tokens | integer | Tokens in generated response |
| usage.total_tokens | integer | Sum of prompt and completion tokens |
| usage.prompt_tokens_details | object | Breakdown of prompt tokens (optional) |
| usage.completion_tokens_details | object | Breakdown of completion tokens (optional) |
Message Object (Response)
| Field | Type | Description |
|---|---|---|
| role | string | Always “assistant” |
| content | string | Generated response text |
| tool_calls | array | Tool calls requested by model (optional) |
| reasoning | string | Reasoning process (when reasoning enabled) |
| refusal | string | Model’s refusal message (optional) |
Finish Reason Values
| Reason | Description |
|---|---|
| stop | Model completed response naturally |
| length | Response cut off due to max_completion_tokens limit |
| content_filter | Content filtered by provider safety systems |
| tool_calls | Model requested tool/function calls |
OpenAI Compatibility
This endpoint is designed for compatibility with OpenAI’s Chat Completions API.
Provider-Specific Parameter Support
Not all parameters are supported by all providers:
| Parameter | Claude | Groq | vLLM |
|---|---|---|---|
| temperature | Yes | Yes | Yes |
| top_p | Yes* | Yes | Yes |
| n | Only n=1 | Yes | Yes |
| stop | Yes | Yes | Yes |
| seed | No | Yes | Yes |
| frequency_penalty | No | Yes | Yes |
| presence_penalty | No | Yes | Yes |
| logprobs | No | Yes | Yes |
| top_logprobs | No | Yes | Yes |
| logit_bias | No | Yes | Yes |
| tools | Yes | Yes | Yes |
| parallel_tool_calls | No | Yes | Yes |
| reasoning_effort | Yes | Yes** | Yes*** |
| response_format | No | Yes | Yes |
| user | No | Yes | Yes |
* Claude ignores top_p when reasoning is enabled ** Groq uses model’s default reasoning behavior when parameter is omitted; qwen3-32b always uses “default” internally *** vLLM uses model’s default reasoning behavior when parameter is omitted
Ignored Parameters
The following parameters are accepted for SDK compatibility but are currently ignored:
| Parameter | Status |
|---|---|
| store | Ignored |
| metadata | Ignored |
| modalities | Ignored |
| audio | Ignored |
| prediction | Ignored |
| web_search_options | Ignored |
Business Logic
Request Processing
- Validation: Validates parameters, message array, roles, and token limits
- Token Validation: Checks input tokens against model’s
max_input_tokenslimit - Model Resolution: Looks up provider client based on model ID
- Parameter Normalization: Applies defaults for optional parameters
- Inference: Calls provider’s Message() or MessageStream() method
Token Limit Validation
Before sending to the provider, the API validates:
- Input tokens do not exceed
max_input_tokens - Estimated total (input + max_completion_tokens) does not exceed
max_total_tokens
If limits are exceeded, a 400 error is returned with token breakdown.
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid JSON, empty messages, invalid role, parameter out of range, token limit exceeded |
| 401 | Unauthorized | Missing or invalid authentication |
| 403 | Forbidden | Insufficient permissions or model not allowed |
| 500 | Internal Server Error | Provider error, model unavailable |
Examples
Example 1: Simple Question
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"max_completion_tokens": 100
}'
Response:
{
"id": "msg_123",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "2 + 2 = 4"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 5,
"total_tokens": 13
}
}
Example 2: Multi-turn Conversation
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "system", "content": "You are a cybersecurity expert"},
{"role": "user", "content": "What is a zero-day vulnerability?"},
{"role": "assistant", "content": "A zero-day vulnerability is a software security flaw that is unknown to the vendor..."},
{"role": "user", "content": "How can organizations protect against them?"}
],
"max_completion_tokens": 500
}'
Example 3: Streaming Response
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Explain the MITRE ATT&CK framework"}
],
"stream": true,
"stream_options": {"include_usage": true}
}'
Example 4: JSON Response Format
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "Extract the name and severity from: Critical vulnerability in Apache Log4j"
}
],
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"severity": {"type": "string"}
}
}
}
}'
Response:
{
"choices": [{
"message": {
"role": "assistant",
"content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
},
"finish_reason": "stop"
}]
}
Example 5: Extended Reasoning
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-opus-4",
"messages": [
{
"role": "user",
"content": "Design a zero-trust network architecture for a financial institution"
}
],
"reasoning_effort": "high",
"max_completion_tokens": 4000
}'
Example 6: Vision (Image Analysis)
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What security issues do you see in this network topology?"},
{"type": "image_url", "image_url": {"url": "https://example.com/network.png", "detail": "high"}}
]
}
]
}'
Example 7: Function Calling
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Scan the domain threatwinds.com for vulnerabilities"}
],
"tools": [
{
"type": "function",
"function": {
"name": "vulnerability_scan",
"description": "Perform a vulnerability scan on a domain",
"parameters": {
"type": "object",
"properties": {
"domain": {"type": "string", "description": "Domain to scan"},
"scan_type": {"type": "string", "enum": ["quick", "full", "stealth"]}
},
"required": ["domain"]
}
}
}
]
}'
Best Practices
Message Construction
- System Messages: Use system messages to set behavior and context
- Message History: Include relevant conversation history for context
- Clear Prompts: Be specific and clear in user messages
- Role Consistency: Maintain proper role alternation
Token Management
- Set Max Tokens: Always set
max_completion_tokensto control costs - Count First: Use token counting endpoint for large requests
- Monitor Usage: Track token usage via response
usagefield - Truncate History: Remove old messages to stay within limits
Streaming
- Use for Long Responses: Enable streaming for better UX on long responses
- Handle Chunks: Process each chunk as it arrives
- Check for [DONE]: Always check for the
[DONE]marker - Include Usage: Set
include_usage: truefor token tracking
Error Handling
- Validate Before Send: Check message count and roles before sending
- Handle Provider Errors: Be prepared for provider-specific errors
- Implement Retries: Add exponential backoff for transient errors
- Log Interaction IDs: Save interaction IDs for debugging
Performance
- Choose Appropriate Models: Use faster models for simple tasks
- Minimize Context: Send only necessary message history
- Parallel Requests: Process multiple items concurrently when possible
- Cache Results: Cache frequent queries to reduce API calls
Security
- Validate Input: Sanitize user input before sending to AI
- Filter Output: Validate and sanitize AI responses
- Monitor Usage: Track unusual patterns via logs
- Rotate Keys: Regularly rotate API credentials
Cost Optimization
- Model Selection: Use smaller/faster models when quality permits
- Token Limits: Set appropriate max_completion_tokens
- Batch Processing: Group similar requests
- Response Caching: Cache responses for repeated queries