Chat Completions

Generate AI chat completions using any supported model. This endpoint handles inference requests and returns AI-generated responses, with full OpenAI API compatibility including streaming.

Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/completions

Method: POST

Parameters

Headers

Header Type Required Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes Must be application/json

Note: You must use either Authorization header OR API key/secret combination.

Request Body

{
  "model": "gemini-3-pro",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024,
  "temperature": 1.0,
  "stream": false
}

Request Parameters

Parameter Type Required Description
model string Yes Model ID to use for inference (see Models)
messages array Yes Message history for context (minimum 1)
max_completion_tokens integer No Maximum tokens in response (defaults to 50% of model max)
max_tokens integer No Alias for max_completion_tokens (OpenAI compatibility)
temperature float No Sampling temperature (0.0 to 2.0, default: 1.0)
top_p float No Nucleus sampling (0.0 to 1.0, default: 1.0)
top_k integer No Number of top tokens to consider (-1 for all tokens). Limits the token pool before sampling
min_p float No Minimum probability relative to the top token (0.0 to 1.0, 0 disables). Dynamic cutoff that adapts to each token’s probability distribution
n integer No Number of completions to generate (1-128, default: 1). Some providers may restrict this value.
stop string/array No Stop sequences (max 4 sequences)
seed integer No Seed for deterministic sampling
frequency_penalty float No Frequency penalty (-2.0 to 2.0, default: 0). Penalizes tokens based on their frequency in the generated text
presence_penalty float No Presence penalty (-2.0 to 2.0, default: 0). Penalizes tokens based on whether they appear in the generated text
repetition_penalty float No Repetition penalty (must be > 0, default: 1.0). Penalizes tokens that appear in the prompt and generated text. Values > 1.0 discourage repetition, values < 1.0 encourage it
logprobs boolean No Return log probabilities of output tokens
top_logprobs integer No Number of top logprobs per token (0-5, requires logprobs=true)
logit_bias object No Token bias map (token_id: -100 to 100)
user string No Unique user identifier for tracking
stream boolean No Enable SSE streaming (default: false)
stream_options object No Streaming configuration
tools array No Tool/function definitions (max 128)
tool_choice string/object No Tool selection strategy
parallel_tool_calls boolean No Allow parallel tool calls
reasoning_effort string No Reasoning effort level: low, medium, high (behavior when omitted varies by provider)
service_tier string No Service tier for priority routing (omit for default)
response_format object No Response format specification

Message Object

Field Type Required Description
role string Yes Message role: user, assistant, system, tool, developer
content string/array Yes Message content (string or array of content blocks)
name string No Participant name
tool_calls array No Tool calls generated by the model (for assistant messages)
tool_call_id string No Tool call identifier (for tool messages)
reasoning string No Reasoning text (for assistant messages)

Control the structure of the model’s output:

JSON Object

Request JSON-formatted output:

{
  "response_format": {
    "type": "json_object"
  }
}

JSON Schema

Request output matching a specific schema. Follows the OpenAI Structured Outputs spec — type is "json_schema" and the schema lives under json_schema.schema. The name and strict fields are required by upstream:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "threat_record",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "threat_name": {"type": "string"},
          "severity": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
          "indicators": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["threat_name", "severity"],
        "additionalProperties": false
      }
    }
  }
}

Request

To generate a chat completion, use a POST request:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gemini-3-pro",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
  "max_completion_tokens": 1024
}'

Or using API key and secret:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'api-secret: your-api-secret' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-oss-20b",
  "messages": [
    {
      "role": "user",
      "content": "Explain threat intelligence"
    }
  ]
}'

Response

A successful response will return the AI-generated completion with usage statistics.

Success Response (200 OK)

{
  "id": "msg_01AbCdEf123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "gemini-3-pro",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "XDR (Extended Detection and Response) is a unified security platform that..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 150,
    "total_tokens": 162
  }
}

Response Schema

Field Type Description
id string Unique interaction identifier
object string Response type: "chat.completion" or "chat.completion.chunk"
created integer Unix timestamp of creation
model string Model used for inference
system_fingerprint string Backend configuration identifier (optional)
service_tier string Service tier used for request (optional)
choices array Generated response choices
choices[].index integer Choice index in array
choices[].finish_reason string How generation ended
choices[].message object Generated message (non-streaming)
choices[].delta object Incremental message (streaming)
choices[].logprobs object Log probabilities (when requested)
usage object Token usage statistics
usage.prompt_tokens integer Tokens in input messages
usage.completion_tokens integer Tokens in generated response
usage.total_tokens integer Sum of prompt and completion tokens
usage.prompt_tokens_details object Breakdown of prompt tokens (optional)
usage.prompt_tokens_details.audio_tokens integer Audio input tokens (for audio models)
usage.prompt_tokens_details.cached_tokens integer Cached prefix tokens reused from previous requests
usage.completion_tokens_details object Breakdown of completion tokens (optional)
usage.completion_tokens_details.reasoning_tokens integer Tokens used for reasoning/thinking
usage.completion_tokens_details.accepted_prediction_tokens integer Prefill tokens accepted by model (drafting)
usage.completion_tokens_details.rejected_prediction_tokens integer Prefill tokens rejected by model (drafting)

Message Object (Response)

Field Type Description
role string Always “assistant”
content string Generated response text
tool_calls array Tool calls requested by model (optional)
reasoning string Reasoning process (when reasoning enabled)
refusal string Model’s refusal message (optional)

Finish Reason Values

Reason Description
stop Model completed response naturally
length Response cut off due to max_completion_tokens limit
content_filter Content filtered by provider safety systems
tool_calls Model requested tool/function calls

OpenAI Compatibility

This endpoint is designed for compatibility with OpenAI’s Chat Completions API.

Provider-Specific Parameter Support

Not all providers support every parameter – unsupported parameters are silently ignored unless noted otherwise.

Core Parameters

These parameters are used across multiple chat providers, though support varies by provider. See the columns for specific compatibility.

Parameter Description ThreatWinds OpenAI Gemini
model Model ID to use Yes Yes Yes
messages Message history array Yes Yes Yes
temperature Sampling temperature (0.0-2.0, default: 1.0) Yes Yes Yes
top_p Nucleus sampling (0.0-1.0, default: 1.0) Yes Yes Yes
top_k Top-k sampling (-1 for all tokens) Yes No Yes
min_p Minimum probability vs top token (0.0-1.0) Yes No No
n Number of completions (1-128) Yes Yes No
stop Stop sequences (max 4) Yes Yes Yes
seed Random seed for reproducibility Yes Yes No
frequency_penalty Penalize frequent tokens (-2.0 to 2.0) Yes Yes No
presence_penalty Penalize repeated tokens (-2.0 to 2.0) Yes Yes No
repetition_penalty Penalize tokens appearing in prompt Yes No No
user User identifier for tracking Yes Yes No

Token Limits

Parameter Description ThreatWinds OpenAI Gemini
max_completion_tokens Maximum output tokens Yes Yes Yes
max_tokens Alias for max_completion_tokens (compatibility) Yes Yes Yes

Advanced Features

Parameter Description ThreatWinds OpenAI Gemini
logprobs Return token log probabilities Yes Yes No
top_logprobs Number of top logprobs per token (0-5) Yes Yes No
logit_bias Token bias map (token_id: -100 to 100) Yes Yes No
tools Tool/function definitions (max 128) Yes Yes Yes
tool_choice Tool selection strategy Yes Yes Yes
parallel_tool_calls Allow calling multiple tools simultaneously Yes Yes No
response_format Force JSON or structured output format Yes Yes No
stream Enable Server-Sent Events streaming Yes Yes Yes
stream_options Streaming configuration (include_usage) Yes Yes Yes

* Embedding and audio models (under the threatwinds provider) use their dedicated endpoints (/embeddings, /audio/*) and do not support /chat/completions.
‡ Chat models only; embeddings and audio models have different parameter sets.

* Embedding and audio models (under the threatwinds provider) use their dedicated endpoints (/embeddings, /audio/*) and do not support /chat/completions.
‡ Chat models only; embeddings and audio models have different parameter sets.


Error Codes

Status Code Description Possible Cause
200 OK Request successful
400 Bad Request Invalid JSON, empty messages, invalid role, parameter out of range, token limit exceeded
401 Unauthorized Missing or invalid authentication
403 Forbidden Insufficient permissions or model not allowed
500 Internal Server Error Provider error, model unavailable

Examples

Example 1: Simple Question

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5",
    "messages": [
      {"role": "user", "content": "What is 2+2?"}
    ],
    "max_completion_tokens": 100
  }'

Response:

{
  "id": "msg_123",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "2 + 2 = 4"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 5,
    "total_tokens": 13
  }
}

Example 2: Multi-turn Conversation

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-5",
    "messages": [
      {"role": "system", "content": "You are a cybersecurity expert"},
      {"role": "user", "content": "What is a zero-day vulnerability?"},
      {"role": "assistant", "content": "A zero-day vulnerability is a software security flaw that is unknown to the vendor..."},
      {"role": "user", "content": "How can organizations protect against them?"}
    ],
    "max_completion_tokens": 500
  }'

Example 3: Streaming Response

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-flash",
    "messages": [
      {"role": "user", "content": "Explain the MITRE ATT&CK framework"}
    ],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Example 4: JSON Response Format

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "gpt-5",
  "messages": [
    {
      "role": "user",
      "content": "What is XDR in cybersecurity?"
    }
  ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "threat_record",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "severity": {"type": "string"}
          },
          "required": ["name", "severity"],
          "additionalProperties": false
        }
      }
    }
  }'

Response:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "{\"name\": \"Apache Log4j\", \"severity\": \"Critical\"}"
    },
    "finish_reason": "stop"
  }]
}

Example 5: Extended Reasoning

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [
      {
        "role": "user",
        "content": "Design a zero-trust network architecture for a financial institution"
      }
    ],
    "reasoning_effort": "high",
    "max_completion_tokens": 4000
  }'

Example 6: Vision (Image Analysis)

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What security issues do you see in this network topology?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/network.png", "detail": "high"}}
        ]
      }
    ]
  }'

Example 7: Function Calling

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/completions' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemini-3-pro",
    "messages": [
      {"role": "user", "content": "Scan the domain threatwinds.com for vulnerabilities"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "vulnerability_scan",
          "description": "Perform a vulnerability scan on a domain",
          "parameters": {
            "type": "object",
            "properties": {
              "domain": {"type": "string", "description": "Domain to scan"},
              "scan_type": {"type": "string", "enum": ["quick", "full", "stealth"]}
            },
            "required": ["domain"]
          }
        }
      }
    ]
  }'

Best Practices

Message Construction

  1. System Messages: Use system messages to set behavior and context
  2. Message History: Include relevant conversation history for context
  3. Clear Prompts: Be specific and clear in user messages
  4. Role Consistency: Maintain proper role alternation

Token Management

  1. Set Max Tokens: Always set max_completion_tokens to control costs
  2. Count First: Use token counting endpoint for large requests
  3. Monitor Usage: Track token usage via response usage field
  4. Truncate History: Remove old messages to stay within limits

Streaming

  1. Use for Long Responses: Enable streaming for better UX on long responses
  2. Handle Chunks: Process each chunk as it arrives
  3. Check for [DONE]: Always check for the [DONE] marker
  4. Include Usage: Set include_usage: true for token tracking

Error Handling

  1. Validate Before Send: Check message count and roles before sending
  2. Handle Provider Errors: Be prepared for provider-specific errors
  3. Implement Retries: Add exponential backoff for transient errors
  4. Log Interaction IDs: Save interaction IDs for debugging

Performance

  1. Choose Appropriate Models: Use faster models for simple tasks
  2. Minimize Context: Send only necessary message history
  3. Parallel Requests: Process multiple items concurrently when possible
  4. Cache Results: Cache frequent queries to reduce API calls

Security

  1. Validate Input: Sanitize user input before sending to AI
  2. Filter Output: Validate and sanitize AI responses
  3. Monitor Usage: Track unusual patterns via logs
  4. Rotate Keys: Regularly rotate API credentials

Cost Optimization

  1. Model Selection: Use smaller/faster models when quality permits
  2. Token Limits: Set appropriate max_completion_tokens
  3. Batch Processing: Group similar requests
  4. Response Caching: Cache responses for repeated queries