Token Counting

Count the number of tokens in messages before making an inference request. This is useful for cost estimation and ensuring messages fit within model limits.

Important: Token counting is only supported for Claude models. Groq models will return a 400 error when attempting to count tokens.

Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/count

Method: POST

Parameters

Headers

Header	Type	Required	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication
Content-Type	string	Yes	Must be `application/json`

Note: You must use either Authorization header OR API key/secret combination.

Request Body

{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is Extended Detection and Response (XDR)?"
    }
  ]
}

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	Model ID to use for token counting
messages	array	Yes	Messages to count tokens for (minimum 1)

Message Object

Field	Type	Required	Description
role	string	Yes	Message role: user, assistant, system, tool, developer
content	string	Yes	Message text content
tool_call_id	string	No	Tool call identifier (for tool messages)
reasoning	string	No	Reasoning text (for assistant messages)

Request

To count tokens in messages, use a POST request:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is Extended Detection and Response (XDR)?"
    }
  ]
}'

Or using API key and secret:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'api-secret: your-api-secret' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-opus-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a cybersecurity expert"
    },
    {
      "role": "user",
      "content": "Explain threat intelligence"
    }
  ]
}'

Response

A successful response will return the token count for the input messages.

Success Response (200 OK)

{
  "input_tokens": 42
}

Response Schema

Field	Type	Description
input_tokens	integer	Number of tokens in the input messages

Business Logic

Token Counting Process

Validation: Validates message array is not empty and all roles are valid
Model Resolution: Looks up provider client based on model ID
Token Counting: Calls provider client’s CountTokens() method
Result: Returns token count from provider

Model-Specific Tokenization

Token counts are calculated using the specific tokenizer for each model:

Claude models (claude-sonnet-4, claude-opus-4): Use Anthropic’s tokenizer ✅ Supported
Groq models (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout): ❌ Not Supported - Returns 400 error

Note: Only Claude models support token counting. Attempting to count tokens for Groq models will result in a 400 Bad Request error.

Error Codes

Status Code	Description	Possible Cause
200	OK	Token count successful
400	Bad Request	Invalid JSON, zero messages, invalid message role, unsupported model (Groq)
401	Unauthorized	Missing or invalid authentication
403	Forbidden	Insufficient permissions
500	Internal Server Error	Provider error, model unavailable

Examples

Example 1: Count Simple Message

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Response:

{
  "input_tokens": 7
}

Example 2: Count Multi-turn Conversation

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "What is machine learning?"},
      {"role": "assistant", "content": "Machine learning is a subset of AI..."},
      {"role": "user", "content": "Can you give me examples?"}
    ]
  }'

Response:

{
  "input_tokens": 52
}

Example 3: Count Long Context

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": "'"$(cat large-document.txt)"'"
      }
    ]
  }'

Response:

{
  "input_tokens": 15482
}

Use Cases

Cost Estimation

Calculate approximate cost before making requests:

# 1. Count tokens
TOKEN_COUNT=$(curl -s -X POST \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [...]
  }' | jq '.input_tokens')

# 2. Calculate cost (example: $3 per million input tokens)
COST=$(echo "scale=6; $TOKEN_COUNT * 3 / 1000000" | bc)
echo "Estimated input cost: \$$COST"

# 3. Add estimated output cost
# Assume 500 output tokens at $15 per million
OUTPUT_COST=$(echo "scale=6; 500 * 15 / 1000000" | bc)
TOTAL_COST=$(echo "scale=6; $COST + $OUTPUT_COST" | bc)
echo "Total estimated cost: \$$TOTAL_COST"

Validate Against Model Limits

Check if messages fit within model limits:

# 1. Count tokens
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [...]
  }' > token_count.json

# 2. Get model limits
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models/claude-opus-4' \
  -H 'Authorization: Bearer <token>' > model_details.json

# 3. Compare
INPUT_TOKENS=$(jq '.input_tokens' token_count.json)
MAX_INPUT=$(jq '.limits.max_input_tokens' model_details.json)

if [ $INPUT_TOKENS -gt $MAX_INPUT ]; then
  echo "Error: Input ($INPUT_TOKENS) exceeds model limit ($MAX_INPUT)"
  exit 1
fi

Dynamic Context Truncation

Automatically truncate conversation history to fit limits:

import requests

def count_tokens(model, messages, token):
    response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/count',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )
    return response.json()['input_tokens']

def get_model_limit(model, token):
    response = requests.get(
        f'https://apis.threatwinds.com/api/ai/v1/models/{model}',
        headers={'Authorization': f'Bearer {token}'}
    )
    return response.json()['limits']['max_input_tokens']

def truncate_to_fit(model, messages, token, max_output_tokens=4096):
    max_input = get_model_limit(model, token)
    budget = max_input - max_output_tokens - 100  # Safety margin

    while count_tokens(model, messages, token) > budget:
        # Remove oldest message (keep system message)
        if len(messages) > 2 and messages[0]['role'] == 'system':
            messages.pop(1)  # Remove second message
        elif len(messages) > 1:
            messages.pop(0)  # Remove first message
        else:
            break

    return messages

Monitor Token Usage Trends

Track token usage over time:

import requests
from datetime import datetime

def log_token_usage(model, messages, token):
    count_response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/count',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )

    input_tokens = count_response.json()['input_tokens']

    # Log to database or file
    log_entry = {
        'timestamp': datetime.now().isoformat(),
        'model': model,
        'input_tokens': input_tokens
    }

    # Make actual request
    response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/completions',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )

    # Log output tokens
    usage = response.json()['usage']
    log_entry['completion_tokens'] = usage['completion_tokens']
    log_entry['total_tokens'] = usage['total_tokens']

    return response.json(), log_entry

Best Practices

When to Count Tokens

Before Large Requests: Always count tokens for large documents
Cost-Sensitive Applications: Count before requests in production
User-Facing Applications: Show estimated costs to users
Dynamic Content: Count when message length varies significantly
Claude Models Only: Remember that token counting is only available for Claude models (claude-sonnet-4, claude-opus-4)

Optimization Tips

Cache Token Counts: Cache counts for static content
Batch Counting: Count multiple message sets in parallel
Approximate for UI: Use rough estimates (4 chars ≈ 1 token) for UI
Count Once: Count tokens once, then track changes

Token Budgeting

# Example token budget allocation
MAX_TOTAL_TOKENS = 128000  # GPT-4o limit
SYSTEM_MESSAGE_BUDGET = 500
USER_QUERY_BUDGET = 2000
CONVERSATION_HISTORY_BUDGET = 20000
OUTPUT_BUDGET = 4096
SAFETY_MARGIN = 500

# Calculate remaining budget for context
CONTEXT_BUDGET = (
    MAX_TOTAL_TOKENS
    - SYSTEM_MESSAGE_BUDGET
    - USER_QUERY_BUDGET
    - OUTPUT_BUDGET
    - SAFETY_MARGIN
)

print(f"Context budget: {CONTEXT_BUDGET} tokens")

Error Handling

def safe_count_tokens(model, messages, token, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                'https://apis.threatwinds.com/api/ai/v1/chat/count',
                headers={'Authorization': f'Bearer {token}'},
                json={'model': model, 'messages': messages},
                timeout=10
            )
            response.raise_for_status()
            return response.json()['input_tokens']
        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)  # Exponential backoff

    return None

Token Counting vs. Actual Usage

Important: Token counts from this endpoint are estimates. Actual token usage in completions may differ slightly due to:

Internal Formatting: Providers may add formatting tokens
Special Tokens: System tokens, delimiters, etc.
Message Structure: How messages are encoded internally
Tool/Function Calls: Additional tokens for tool calling

Always use the usage field from completion responses for accurate billing and tracking.

Token Pricing Reference

Approximate pricing for common models (as of 2025):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Token Counting
Claude Sonnet 4	$3.00	$15.00	✅ Supported
Claude Opus 4	$15.00	$75.00	✅ Supported
GPT OSS 20B (Groq)	$0.10	$0.10	❌ Not Supported
GPT OSS 120B (Groq)	$0.15	$0.15	❌ Not Supported
Qwen 3 32B (Groq)	$0.10	$0.10	❌ Not Supported
LLaMA 4 Maverick (Groq)	$0.10	$0.10	❌ Not Supported
LLaMA 4 Scout (Groq)	$0.10	$0.10	❌ Not Supported

Note: Pricing varies by provider and may change. Check provider documentation for current rates. Token counting is only available for Claude models.