Token Counting
Count the number of tokens in messages before making an inference request. This is useful for cost estimation and ensuring messages fit within model limits.
Important: Token counting is only supported for Claude models. Groq models will return a 400 error when attempting to count tokens.
Endpoint: https://apis.threatwinds.com/api/ai/v1/chat/count
Method: POST
Parameters
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | Must be application/json |
Note: You must use either Authorization header OR API key/secret combination.
Request Body
{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is Extended Detection and Response (XDR)?"
}
]
}
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for token counting |
| messages | array | Yes | Messages to count tokens for (minimum 1) |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
| role | string | Yes | Message role: user, assistant, system, tool, developer |
| content | string | Yes | Message text content |
| tool_call_id | string | No | Tool call identifier (for tool messages) |
| reasoning | string | No | Reasoning text (for assistant messages) |
Request
To count tokens in messages, use a POST request:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "What is Extended Detection and Response (XDR)?"
}
]
}'
Or using API key and secret:
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'accept: application/json' \
-H 'api-key: your-api-key' \
-H 'api-secret: your-api-secret' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-opus-4",
"messages": [
{
"role": "system",
"content": "You are a cybersecurity expert"
},
{
"role": "user",
"content": "Explain threat intelligence"
}
]
}'
Response
A successful response will return the token count for the input messages.
Success Response (200 OK)
{
"input_tokens": 42
}
Response Schema
| Field | Type | Description |
|---|---|---|
| input_tokens | integer | Number of tokens in the input messages |
Business Logic
Token Counting Process
- Validation: Validates message array is not empty and all roles are valid
- Model Resolution: Looks up provider client based on model ID
- Token Counting: Calls provider client’s
CountTokens()method - Result: Returns token count from provider
Model-Specific Tokenization
Token counts are calculated using the specific tokenizer for each model:
- Claude models (claude-sonnet-4, claude-opus-4): Use Anthropic’s tokenizer ✅ Supported
- Groq models (gpt-oss-20b, gpt-oss-120b, qwen3-32b, llama4-maverick, llama4-scout): ❌ Not Supported - Returns 400 error
Note: Only Claude models support token counting. Attempting to count tokens for Groq models will result in a 400 Bad Request error.
Error Codes
| Status Code | Description | Possible Cause |
|---|---|---|
| 200 | OK | Token count successful |
| 400 | Bad Request | Invalid JSON, zero messages, invalid message role, unsupported model (Groq) |
| 401 | Unauthorized | Missing or invalid authentication |
| 403 | Forbidden | Insufficient permissions |
| 500 | Internal Server Error | Provider error, model unavailable |
Examples
Example 1: Count Simple Message
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Response:
{
"input_tokens": 7
}
Example 2: Count Multi-turn Conversation
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-opus-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."},
{"role": "user", "content": "Can you give me examples?"}
]
}'
Response:
{
"input_tokens": 52
}
Example 3: Count Long Context
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "'"$(cat large-document.txt)"'"
}
]
}'
Response:
{
"input_tokens": 15482
}
Use Cases
Cost Estimation
Calculate approximate cost before making requests:
# 1. Count tokens
TOKEN_COUNT=$(curl -s -X POST \
'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-sonnet-4",
"messages": [...]
}' | jq '.input_tokens')
# 2. Calculate cost (example: $3 per million input tokens)
COST=$(echo "scale=6; $TOKEN_COUNT * 3 / 1000000" | bc)
echo "Estimated input cost: \$$COST"
# 3. Add estimated output cost
# Assume 500 output tokens at $15 per million
OUTPUT_COST=$(echo "scale=6; 500 * 15 / 1000000" | bc)
TOTAL_COST=$(echo "scale=6; $COST + $OUTPUT_COST" | bc)
echo "Total estimated cost: \$$TOTAL_COST"
Validate Against Model Limits
Check if messages fit within model limits:
# 1. Count tokens
curl -X POST 'https://apis.threatwinds.com/api/ai/v1/chat/count' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{
"model": "claude-opus-4",
"messages": [...]
}' > token_count.json
# 2. Get model limits
curl -X GET 'https://apis.threatwinds.com/api/ai/v1/models/claude-opus-4' \
-H 'Authorization: Bearer <token>' > model_details.json
# 3. Compare
INPUT_TOKENS=$(jq '.input_tokens' token_count.json)
MAX_INPUT=$(jq '.limits.max_input_tokens' model_details.json)
if [ $INPUT_TOKENS -gt $MAX_INPUT ]; then
echo "Error: Input ($INPUT_TOKENS) exceeds model limit ($MAX_INPUT)"
exit 1
fi
Dynamic Context Truncation
Automatically truncate conversation history to fit limits:
import requests
def count_tokens(model, messages, token):
response = requests.post(
'https://apis.threatwinds.com/api/ai/v1/chat/count',
headers={'Authorization': f'Bearer {token}'},
json={'model': model, 'messages': messages}
)
return response.json()['input_tokens']
def get_model_limit(model, token):
response = requests.get(
f'https://apis.threatwinds.com/api/ai/v1/models/{model}',
headers={'Authorization': f'Bearer {token}'}
)
return response.json()['limits']['max_input_tokens']
def truncate_to_fit(model, messages, token, max_output_tokens=4096):
max_input = get_model_limit(model, token)
budget = max_input - max_output_tokens - 100 # Safety margin
while count_tokens(model, messages, token) > budget:
# Remove oldest message (keep system message)
if len(messages) > 2 and messages[0]['role'] == 'system':
messages.pop(1) # Remove second message
elif len(messages) > 1:
messages.pop(0) # Remove first message
else:
break
return messages
Monitor Token Usage Trends
Track token usage over time:
import requests
from datetime import datetime
def log_token_usage(model, messages, token):
count_response = requests.post(
'https://apis.threatwinds.com/api/ai/v1/chat/count',
headers={'Authorization': f'Bearer {token}'},
json={'model': model, 'messages': messages}
)
input_tokens = count_response.json()['input_tokens']
# Log to database or file
log_entry = {
'timestamp': datetime.now().isoformat(),
'model': model,
'input_tokens': input_tokens
}
# Make actual request
response = requests.post(
'https://apis.threatwinds.com/api/ai/v1/chat/completions',
headers={'Authorization': f'Bearer {token}'},
json={'model': model, 'messages': messages}
)
# Log output tokens
usage = response.json()['usage']
log_entry['completion_tokens'] = usage['completion_tokens']
log_entry['total_tokens'] = usage['total_tokens']
return response.json(), log_entry
Best Practices
When to Count Tokens
- Before Large Requests: Always count tokens for large documents
- Cost-Sensitive Applications: Count before requests in production
- User-Facing Applications: Show estimated costs to users
- Dynamic Content: Count when message length varies significantly
- Claude Models Only: Remember that token counting is only available for Claude models (claude-sonnet-4, claude-opus-4)
Optimization Tips
- Cache Token Counts: Cache counts for static content
- Batch Counting: Count multiple message sets in parallel
- Approximate for UI: Use rough estimates (4 chars ≈ 1 token) for UI
- Count Once: Count tokens once, then track changes
Token Budgeting
# Example token budget allocation
MAX_TOTAL_TOKENS = 128000 # GPT-4o limit
SYSTEM_MESSAGE_BUDGET = 500
USER_QUERY_BUDGET = 2000
CONVERSATION_HISTORY_BUDGET = 20000
OUTPUT_BUDGET = 4096
SAFETY_MARGIN = 500
# Calculate remaining budget for context
CONTEXT_BUDGET = (
MAX_TOTAL_TOKENS
- SYSTEM_MESSAGE_BUDGET
- USER_QUERY_BUDGET
- OUTPUT_BUDGET
- SAFETY_MARGIN
)
print(f"Context budget: {CONTEXT_BUDGET} tokens")
Error Handling
def safe_count_tokens(model, messages, token, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.post(
'https://apis.threatwinds.com/api/ai/v1/chat/count',
headers={'Authorization': f'Bearer {token}'},
json={'model': model, 'messages': messages},
timeout=10
)
response.raise_for_status()
return response.json()['input_tokens']
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
return None
Token Counting vs. Actual Usage
Important: Token counts from this endpoint are estimates. Actual token usage in completions may differ slightly due to:
- Internal Formatting: Providers may add formatting tokens
- Special Tokens: System tokens, delimiters, etc.
- Message Structure: How messages are encoded internally
- Tool/Function Calls: Additional tokens for tool calling
Always use the usage field from completion responses for accurate billing and tracking.
Token Pricing Reference
Approximate pricing for common models (as of 2025):
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Token Counting |
|---|---|---|---|
| Claude Sonnet 4 | $3.00 | $15.00 | ✅ Supported |
| Claude Opus 4 | $15.00 | $75.00 | ✅ Supported |
| GPT OSS 20B (Groq) | $0.10 | $0.10 | ❌ Not Supported |
| GPT OSS 120B (Groq) | $0.15 | $0.15 | ❌ Not Supported |
| Qwen 3 32B (Groq) | $0.10 | $0.10 | ❌ Not Supported |
| LLaMA 4 Maverick (Groq) | $0.10 | $0.10 | ❌ Not Supported |
| LLaMA 4 Scout (Groq) | $0.10 | $0.10 | ❌ Not Supported |
Note: Pricing varies by provider and may change. Check provider documentation for current rates. Token counting is only available for Claude models.