Token Counting

Count the number of tokens in messages before making an inference request. This is useful for cost estimation and ensuring messages fit within model limits.

Important: Token counting is only supported for Claude models. Groq models will return a 400 error when attempting to count tokens.

Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/count

Method: POST

Parameters

Headers

Header Type Required Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes Must be application/json

Note: You must use either Authorization header OR API key/secret combination.

Request Body

{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is Extended Detection and Response (XDR)?"
    }
  ]
}

Request Parameters

Parameter Type Required Description
model string Yes Model ID to use for token counting
messages array Yes Messages to count tokens for (minimum 1)

Message Object

Field Type Required Description
role string Yes Message role: user, assistant, system, tool, developer
content string Yes Message text content
tool_call_id string No Tool call identifier (for tool messages)
reasoning string No Reasoning text (for assistant messages)

Request

To count tokens in messages, use a POST request:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": "What is Extended Detection and Response (XDR)?"
    }
  ]
}'

Or using API key and secret:

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'api-secret: your-api-secret' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "claude-opus-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a cybersecurity expert"
    },
    {
      "role": "user",
      "content": "Explain threat intelligence"
    }
  ]
}'

Response

A successful response will return the token count for the input messages.

Success Response (200 OK)

{
  "input_tokens": 42
}

Response Schema

Field Type Description
input_tokens integer Number of tokens in the input messages

Business Logic

Token Counting Process

  1. Validation: Validates message array is not empty and all roles are valid
  2. Model Resolution: Looks up provider client based on model ID
  3. Token Counting: Calls provider client’s CountTokens() method
  4. Result: Returns token count from provider

Model-Specific Tokenization

Token counts are calculated using the specific tokenizer for each model:

  • Claude models (claude-sonnet-4, claude-opus-4): Use Anthropic’s tokenizer ✅ Supported
  • Groq models (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout): ❌ Not Supported - Returns 400 error

Note: Only Claude models support token counting. Attempting to count tokens for Groq models will result in a 400 Bad Request error.

Error Codes

Status Code Description Possible Cause
200 OK Token count successful
400 Bad Request Invalid JSON, zero messages, invalid message role, unsupported model (Groq)
401 Unauthorized Missing or invalid authentication
403 Forbidden Insufficient permissions
500 Internal Server Error Provider error, model unavailable

Examples

Example 1: Count Simple Message

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Response:

{
  "input_tokens": 7
}

Example 2: Count Multi-turn Conversation

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "What is machine learning?"},
      {"role": "assistant", "content": "Machine learning is a subset of AI..."},
      {"role": "user", "content": "Can you give me examples?"}
    ]
  }'

Response:

{
  "input_tokens": 52
}

Example 3: Count Long Context

curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {
        "role": "user",
        "content": "'"$(cat large-document.txt)"'"
      }
    ]
  }'

Response:

{
  "input_tokens": 15482
}

Use Cases

Cost Estimation

Calculate approximate cost before making requests:

# 1. Count tokens
TOKEN_COUNT=$(curl -s -X POST \
  'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [...]
  }' | jq '.input_tokens')

# 2. Calculate cost (example: $3 per million input tokens)
COST=$(echo "scale=6; $TOKEN_COUNT * 3 / 1000000" | bc)
echo "Estimated input cost: \$$COST"

# 3. Add estimated output cost
# Assume 500 output tokens at $15 per million
OUTPUT_COST=$(echo "scale=6; 500 * 15 / 1000000" | bc)
TOTAL_COST=$(echo "scale=6; $COST + $OUTPUT_COST" | bc)
echo "Total estimated cost: \$$TOTAL_COST"

Validate Against Model Limits

Check if messages fit within model limits:

# 1. Count tokens
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "claude-opus-4",
    "messages": [...]
  }' > token_count.json

# 2. Get model limits
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models/claude-opus-4' \
  -H 'Authorization: Bearer <token>' > model_details.json

# 3. Compare
INPUT_TOKENS=$(jq '.input_tokens' token_count.json)
MAX_INPUT=$(jq '.limits.max_input_tokens' model_details.json)

if [ $INPUT_TOKENS -gt $MAX_INPUT ]; then
  echo "Error: Input ($INPUT_TOKENS) exceeds model limit ($MAX_INPUT)"
  exit 1
fi

Dynamic Context Truncation

Automatically truncate conversation history to fit limits:

import requests

def count_tokens(model, messages, token):
    response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/count',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )
    return response.json()['input_tokens']

def get_model_limit(model, token):
    response = requests.get(
        f'https://apis.threatwinds.com/api/ai/v1/models/{model}',
        headers={'Authorization': f'Bearer {token}'}
    )
    return response.json()['limits']['max_input_tokens']

def truncate_to_fit(model, messages, token, max_output_tokens=4096):
    max_input = get_model_limit(model, token)
    budget = max_input - max_output_tokens - 100  # Safety margin

    while count_tokens(model, messages, token) > budget:
        # Remove oldest message (keep system message)
        if len(messages) > 2 and messages[0]['role'] == 'system':
            messages.pop(1)  # Remove second message
        elif len(messages) > 1:
            messages.pop(0)  # Remove first message
        else:
            break

    return messages

Track token usage over time:

import requests
from datetime import datetime

def log_token_usage(model, messages, token):
    count_response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/count',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )

    input_tokens = count_response.json()['input_tokens']

    # Log to database or file
    log_entry = {
        'timestamp': datetime.now().isoformat(),
        'model': model,
        'input_tokens': input_tokens
    }

    # Make actual request
    response = requests.post(
        'https://apis.threatwinds.com/api/ai/v1/chat/completions',
        headers={'Authorization': f'Bearer {token}'},
        json={'model': model, 'messages': messages}
    )

    # Log output tokens
    usage = response.json()['usage']
    log_entry['completion_tokens'] = usage['completion_tokens']
    log_entry['total_tokens'] = usage['total_tokens']

    return response.json(), log_entry

Best Practices

When to Count Tokens

  1. Before Large Requests: Always count tokens for large documents
  2. Cost-Sensitive Applications: Count before requests in production
  3. User-Facing Applications: Show estimated costs to users
  4. Dynamic Content: Count when message length varies significantly
  5. Claude Models Only: Remember that token counting is only available for Claude models (claude-sonnet-4, claude-opus-4)

Optimization Tips

  1. Cache Token Counts: Cache counts for static content
  2. Batch Counting: Count multiple message sets in parallel
  3. Approximate for UI: Use rough estimates (4 chars ≈ 1 token) for UI
  4. Count Once: Count tokens once, then track changes

Token Budgeting

# Example token budget allocation
MAX_TOTAL_TOKENS = 128000  # GPT-4o limit
SYSTEM_MESSAGE_BUDGET = 500
USER_QUERY_BUDGET = 2000
CONVERSATION_HISTORY_BUDGET = 20000
OUTPUT_BUDGET = 4096
SAFETY_MARGIN = 500

# Calculate remaining budget for context
CONTEXT_BUDGET = (
    MAX_TOTAL_TOKENS
    - SYSTEM_MESSAGE_BUDGET
    - USER_QUERY_BUDGET
    - OUTPUT_BUDGET
    - SAFETY_MARGIN
)

print(f"Context budget: {CONTEXT_BUDGET} tokens")

Error Handling

def safe_count_tokens(model, messages, token, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                'https://apis.threatwinds.com/api/ai/v1/chat/count',
                headers={'Authorization': f'Bearer {token}'},
                json={'model': model, 'messages': messages},
                timeout=10
            )
            response.raise_for_status()
            return response.json()['input_tokens']
        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)  # Exponential backoff

    return None

Token Counting vs. Actual Usage

Important: Token counts from this endpoint are estimates. Actual token usage in completions may differ slightly due to:

  1. Internal Formatting: Providers may add formatting tokens
  2. Special Tokens: System tokens, delimiters, etc.
  3. Message Structure: How messages are encoded internally
  4. Tool/Function Calls: Additional tokens for tool calling

Always use the usage field from completion responses for accurate billing and tracking.

Token Pricing Reference

Approximate pricing for common models (as of 2025):

Model Input (per 1M tokens) Output (per 1M tokens) Token Counting
Claude Sonnet 4 $3.00 $15.00 ✅ Supported
Claude Opus 4 $15.00 $75.00 ✅ Supported
GPT OSS 20B (Groq) $0.10 $0.10 ❌ Not Supported
GPT OSS 120B (Groq) $0.15 $0.15 ❌ Not Supported
Qwen 3 32B (Groq) $0.10 $0.10 ❌ Not Supported
LLaMA 4 Maverick (Groq) $0.10 $0.10 ❌ Not Supported
LLaMA 4 Scout (Groq) $0.10 $0.10 ❌ Not Supported

Note: Pricing varies by provider and may change. Check provider documentation for current rates. Token counting is only available for Claude models.