Chat Completions
Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses, with full OpenAI API compatibility including streaming.
Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be application/json |
Note: You must use either Authorization header OR API key/secret combination.
Request Body
{
"model": "gemini-3-pro",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024,
"temperature": 1.0,
"stream": false
}
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for inference (see Models) |
| messages | array | Yes | Message history for context (minimum 1) |
| max_completion_tokens | integer | No | Maximum tokens in response (defaults to 50% of model max) |
| max_tokens | integer | No | Alias for max_completion_tokens (OpenAI compatibility) |
| temperature | float | No | Sampling temperature (0.0 to 2.0, default: 1.0) |
| top_p | float | No | Nucleus sampling (0.0 to 1.0, default: 1.0) |
| top_k | integer | No | Number of top tokens to consider (-1 for all tokens). Limits the token pool before sampling |
| min_p | float | No | Minimum probability relative to the top token (0.0 to 1.0, 0 disables). Dynamic cutoff that adapts to each token’s probability distribution |
| n | integer | No | Number of completions to generate (1-128, default: 1). Some providers may restrict this value. |
| stop | string/array | No | Stop sequences (max 4 sequences) |
| seed | integer | No | Seed for deterministic sampling |
| frequency_penalty | float | No | Frequency penalty (-2.0 to 2.0, default: 0). Penalizes tokens based on their frequency in the generated text |
| presence_penalty | float | No | Presence penalty (-2.0 to 2.0, default: 0). Penalizes tokens based on whether they appear in the generated text |
| repetition_penalty | float | No | Repetition penalty (must be > 0, default: 1.0). Penalizes tokens that appear in the prompt and generated text. Values > 1.0 discourage repetition, values < 1.0 encourage it |
| logprobs | boolean | No | Return log probabilities of output tokens |
| top_logprobs | integer | No | Number of top logprobs per token (0-5, requires logprobs=true) |
| logit_bias | object | No | Token bias map (token_id: -100 to 100) |
| user | string | No | Unique user identifier for tracking |
| stream | boolean | No | Enable SSE streaming (default: false) |
| stream_options | object | No | Streaming configuration |
| tools | array | No | Tool/function definitions (max 128) |
| tool_choice | string/object | No | Tool selection strategy |
| parallel_tool_calls | boolean | No | Allow parallel tool calls |
| reasoning_effort | string | No | Reasoning effort level: low, medium, high (behavior when omitted varies by provider) |
| service_tier | string | No | Service tier for priority routing (omit for default) |
| response_format | object | No | Response format specification |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
| role | string | Yes | Message role: user, assistant, system, tool, developer |
| content | string/array | Yes | Message content (string or array of content blocks) |
| name | string | No | Participant name |
| tool_calls | array | No | Tool calls generated by the model (for assistant messages) |
| tool_call_id | string | No | Tool call identifier (for tool messages) |
| reasoning | string | No | Reasoning text (for assistant messages) |
Control the structure of the model’s output:
JSON Object
Request JSON-formatted output:
{
"response_format": {
"type": "json_object"
}
}
JSON Schema
Request output matching a specific schema. Follows the OpenAI Structured Outputs spec — type is "json_schema" and the schema lives under json_schema.schema. The name and strict fields are required by upstream:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "threat_record",
"strict": true,
"schema": {
"type": "object",
"properties": {
"threat_name": {"type": "string"},
"severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
"indicators": {"type": "array", "items": {"type": "string"}}
},
"required": ["threat_name", "severity"],
"additionalProperties": false
}
}
}
}
Request
To generate a chat completion, use a POST request:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"max_completion_tokens": 1024
}'
Or using API key and secret:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'api-secret: your-api-secret' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-oss-20b",
"messages": [
{
"role": "user",
"content": "Explain threat intelligence"
}
]
}'
Response
A successful response will return the AI-generated completion with usage statistics.
Success Response (200 OK)
{
"id": "msg_01AbCdEf123",
"object": "chat.completion",
"created": 1704067200,
"model": "gemini-3-pro",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "XDR (Extended Detection and Response) is a unified security platform that..."
}
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 150,
"total_tokens": 162
}
}
Response Schema
| Field | Type | Description |
|---|---|---|
| id | string | Unique interaction identifier |
| object | string | Response type: "chat.completion" or "chat.completion.chunk" |
| created | integer | Unix timestamp of creation |
| model | string | Model used for inference |
| system_fingerprint | string | Backend configuration identifier (optional) |
| service_tier | string | Service tier used for request (optional) |
| choices | array | Generated response choices |
| choices[].index | integer | Choice index in array |
| choices[].finish_reason | string | How generation ended |
| choices[].message | object | Generated message (non-streaming) |
| choices[].delta | object | Incremental message (streaming) |
| choices[].logprobs | object | Log probabilities (when requested) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Tokens in input messages |
| usage.completion_tokens | integer | Tokens in generated response |
| usage.total_tokens | integer | Sum of prompt and completion tokens |
| usage.prompt_tokens_details | object | Breakdown of prompt tokens (optional) |
| usage.prompt_tokens_details.audio_tokens | integer | Audio input tokens (for audio models) |
| usage.prompt_tokens_details.cached_tokens | integer | Cached prefix tokens reused from previous requests |
| usage.completion_tokens_details | object | Breakdown of completion tokens (optional) |
| usage.completion_tokens_details.reasoning_tokens | integer | Tokens used for reasoning/thinking |
| usage.completion_tokens_details.accepted_prediction_tokens | integer | Prefill tokens accepted by model (drafting) |
| usage.completion_tokens_details.rejected_prediction_tokens | integer | Prefill tokens rejected by model (drafting) |
Message Object (Response)
| Field | Type | Description |
|---|---|---|
| role | string | Always “assistant” |
| content | string | Generated response text |
| tool_calls | array | Tool calls requested by model (optional) |
| reasoning | string | Reasoning process (when reasoning enabled) |
| refusal | string | Model’s refusal message (optional) |
Finish Reason Values
| Reason | Description |
|---|---|
| stop | Model completed response naturally |
| length | Response cut off due to max_completion_tokens limit |
| content_filter | Content filtered by provider safety systems |
| tool_calls | Model requested tool/function calls |
OpenAI Compatibility
This endpoint is designed for compatibility with OpenAI’s Chat Completions API.
Provider-Specific Parameter Support
Not all providers support every parameter – unsupported parameters are silently ignored unless noted otherwise.
Core Parameters
These parameters are used across multiple chat providers, though support varies by provider. See the columns for specific compatibility.
| Parameter | Description | ThreatWinds | OpenAI | Gemini |
|---|---|---|---|---|
| model | Model ID to use | Yes | Yes | Yes |
| messages | Message history array | Yes | Yes | Yes |
| temperature | Sampling temperature (0.0-2.0, default: 1.0) | Yes | Yes | Yes |
| top_p | Nucleus sampling (0.0-1.0, default: 1.0) | Yes | Yes | Yes |
| top_k | Top-k sampling (-1 for all tokens) | Yes | No | Yes |
| min_p | Minimum probability vs top token (0.0-1.0) | Yes | No | No |
| n | Number of completions (1-128) | Yes | Yes | No |
| stop | Stop sequences (max 4) | Yes | Yes | Yes |
| seed | Random seed for reproducibility | Yes | Yes | No |
| frequency_penalty | Penalize frequent tokens (-2.0 to 2.0) | Yes | Yes | No |
| presence_penalty | Penalize repeated tokens (-2.0 to 2.0) | Yes | Yes | No |
| repetition_penalty | Penalize tokens appearing in prompt | Yes | No | No |
| user | User identifier for tracking | Yes | Yes | No |
Token Limits
| Parameter | Description | ThreatWinds | OpenAI | Gemini |
|---|---|---|---|---|
| max_completion_tokens | Maximum output tokens | Yes | Yes | Yes |
| max_tokens | Alias for max_completion_tokens (compatibility) | Yes | Yes | Yes |
Advanced Features
| Parameter | Description | ThreatWinds | OpenAI | Gemini |
|---|---|---|---|---|
| logprobs | Return token log probabilities | Yes | Yes | No |
| top_logprobs | Number of top logprobs per token (0-5) | Yes | Yes | No |
| logit_bias | Token bias map (token_id: -100 to 100) | Yes | Yes | No |
| tools | Tool/function definitions (max 128) | Yes | Yes | Yes |
| tool_choice | Tool selection strategy | Yes | Yes | Yes |
| parallel_tool_calls | Allow calling multiple tools simultaneously | Yes | Yes | No |
| response_format | Force JSON or structured output format | Yes | Yes | No |
| stream | Enable Server-Sent Events streaming | Yes | Yes | Yes |
| stream_options | Streaming configuration (include_usage) | Yes | Yes | Yes |
* Embedding and audio models (under the threatwinds provider) use their dedicated endpoints (/embeddings, /audio/*) and do not support /chat/completions.
‡ Chat models only; embeddings and audio models have different parameter sets.
* Embedding and audio models (under the threatwinds provider) use their dedicated endpoints (/embeddings, /audio/*) and do not support /chat/completions.
‡ Chat models only; embeddings and audio models have different parameter sets.
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid JSON, empty messages, invalid role, parameter out of range, token limit exceeded |
| 401 | Unauthorized | Missing or invalid authentication |
| 403 | Forbidden | Insufficient permissions or model not allowed |
| 500 | Internal Server Error | Provider error, model unavailable |
Examples
Example 1: Simple Question
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"max_completion_tokens": 100
}'
Response:
{
"id": "msg_123",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "2 + 2 = 4"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 5,
"total_tokens": 13
}
}
Example 2: Multi-turn Conversation
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5",
"messages": [
{"role": "system", "content": "You are a cybersecurity expert"},
{"role": "user", "content": "What is a zero-day vulnerability?"},
{"role": "assistant", "content": "A zero-day vulnerability is a software security flaw that is unknown to the vendor..."},
{"role": "user", "content": "How can organizations protect against them?"}
],
"max_completion_tokens": 500
}'
Example 3: Streaming Response
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-flash",
"messages": [
{"role": "user", "content": "Explain the MITRE ATT&CK framework"}
],
"stream": true,
"stream_options": {"include_usage": true}
}'
Example 4: JSON Response Format
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5",
"messages": [
{
"role": "user",
"content": "What is XDR in cybersecurity?"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "threat_record",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"severity": {"type": "string"}
},
"required": ["name", "severity"],
"additionalProperties": false
}
}
}
}'
Response:
{
"choices": [{
"message": {
"role": "assistant",
"content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
},
"finish_reason": "stop"
}]
}
Example 5: Extended Reasoning
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [
{
"role": "user",
"content": "Design a zero-trust network architecture for a financial institution"
}
],
"reasoning_effort": "high",
"max_completion_tokens": 4000
}'
Example 6: Vision (Image Analysis)
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What security issues do you see in this network topology?"},
{"type": "image_url", "image_url": {"url": "https://example.com/network.png", "detail": "high"}}
]
}
]
}'
Example 7: Function Calling
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3-pro",
"messages": [
{"role": "user", "content": "Scan the domain threatwinds.com for vulnerabilities"}
],
"tools": [
{
"type": "function",
"function": {
"name": "vulnerability_scan",
"description": "Perform a vulnerability scan on a domain",
"parameters": {
"type": "object",
"properties": {
"domain": {"type": "string", "description": "Domain to scan"},
"scan_type": {"type": "string", "enum": ["quick", "full", "stealth"]}
},
"required": ["domain"]
}
}
}
]
}'
Best Practices
Message Construction
- System Messages: Use system messages to set behavior and context
- Message History: Include relevant conversation history for context
- Clear Prompts: Be specific and clear in user messages
- Role Consistency: Maintain proper role alternation
Token Management
- Set Max Tokens: Always set
max_completion_tokensto control costs - Count First: Use token counting endpoint for large requests
- Monitor Usage: Track token usage via response
usagefield - Truncate History: Remove old messages to stay within limits
Streaming
- Use for Long Responses: Enable streaming for better UX on long responses
- Handle Chunks: Process each chunk as it arrives
- Check for [DONE]: Always check for the
[DONE]marker - Include Usage: Set
include_usage: truefor token tracking
Error Handling
- Validate Before Send: Check message count and roles before sending
- Handle Provider Errors: Be prepared for provider-specific errors
- Implement Retries: Add exponential backoff for transient errors
- Log Interaction IDs: Save interaction IDs for debugging
Performance
- Choose Appropriate Models: Use faster models for simple tasks
- Minimize Context: Send only necessary message history
- Parallel Requests: Process multiple items concurrently when possible
- Cache Results: Cache frequent queries to reduce API calls
Security
- Validate Input: Sanitize user input before sending to AI
- Filter Output: Validate and sanitize AI responses
- Monitor Usage: Track unusual patterns via logs
- Rotate Keys: Regularly rotate API credentials
Cost Optimization
- Model Selection: Use smaller/faster models when quality permits
- Token Limits: Set appropriate max_completion_tokens
- Batch Processing: Group similar requests
- Response Caching: Cache responses for repeated queries