Admin Endpoints
Note: These endpoints are restricted to users with the
ai_adminrole.
Overview
The AI API admin endpoints provide operational visibility and control over the inference serving fleet. They expose pod health, server pool status, session affinity metrics, model registry state, and demand telemetry — plus actions to clear affinity, restart pods, and refresh registry mappings.
All admin endpoints require the ai_admin gateway role. They are not rate-limited.
Authentication
Authentication follows the same pattern as all other AI API endpoints:
| Authentication Method | Description |
|---|---|
| Bearer Token | Session-based authentication using Authorization: Bearer <token> header |
| API Key | API key authentication using api-key and api-secret headers |
Endpoints
Pod Management
List Pods
Returns a paginated list of all inference pods with health status, current request load, and cost information.
Endpoint: GET /api/ai/v1/admin/pods
Method: GET
Parameters
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | int | 1 | Page number |
limit | int | 10 | Items per page (maximum 100) |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Note: You must use either Authorization header OR API key/secret combination.
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/pods?page=1&limit=20' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"pods": [
{
"id": "abc123",
"name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
"desired_status": "RUNNING",
"kind": "vllm-qwen3-6-27b-fp8",
"models": ["threatwinds-qwen3-6-27b"],
"healthy": true,
"inflight": 3,
"last_use_ms": 1719000000000,
"idle_seconds": 45,
"last_status_change": "2026-06-25T10:00:00Z",
"cost_per_hr": 0.44,
"data_center_id": "us-central1",
"location": "Iowa",
"gpu_type_id": "a100-80gb"
}
],
"page": 1,
"limit": 20,
"total": 5
}
| Field | Type | Description |
|---|---|---|
id | string | Unique pod identifier |
name | string | Human-readable pod name |
desired_status | string | Desired lifecycle state (RUNNING, STOPPED, etc.) |
kind | string | Pod kind / model configuration |
models | string[] | Model IDs routed to this pod |
healthy | bool/null | Current health status; null if pod not in any server pool |
inflight | int | Active requests currently being processed |
last_use_ms | int | Epoch milliseconds of last activity |
idle_seconds | int | Seconds since last activity |
last_status_change | string | ISO 8601 timestamp of last status change |
cost_per_hr | float | Hourly cost in USD |
data_center_id | string | Data center identifier |
location | string | Geographic location |
gpu_type_id | string | GPU type identifier |
Get Pod Detail
Returns detailed information about a specific inference pod, including health, request load, and lifecycle metadata.
Endpoint: GET /api/ai/v1/admin/pods/{id}
Method: GET
Parameters
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Pod ID |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"id": "abc123",
"name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
"desired_status": "RUNNING",
"kind": "vllm-qwen3-6-27b-fp8",
"models": ["threatwinds-qwen3-6-27b"],
"healthy": true,
"inflight": 3,
"last_use_ms": 1719000000000,
"idle_seconds": 45,
"last_started_at": "2026-06-25T08:00:00Z",
"last_status_change": "2026-06-25T10:00:00Z",
"cost_per_hr": 0.44,
"data_center_id": "us-central1",
"location": "Iowa",
"gpu_type_id": "a100-80gb"
}
| Field | Type | Description |
|---|---|---|
last_started_at | string | ISO 8601 timestamp of last container start |
Error Responses
| Status | Description | Response |
|---|---|---|
404 | Pod not found | {"error": "pod not found"} |
502 | External API error | {"error": "runpod API error", "detail": "..."} |
Restart Pod
Initiates a graceful restart cycle for a pod (stop then resume). This is a fire-and-forget operation — it returns immediately and the pod status updates asynchronously.
Endpoint: POST /api/ai/v1/admin/pods/{id}/restart
Method: POST
Parameters
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Pod ID |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123/restart' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"status": "initiated",
"pod_id": "abc123"
}
| Field | Type | Description |
|---|---|---|
status | string | Operation status ("initiated" or "failed") |
pod_id | string | The pod that was restarted |
failed_at | string (optional) | If status is "failed", indicates which phase failed: "stop" or "resume" |
Error Responses
| Status | Description | Response |
|---|---|---|
404 | Pod not found | {"error": "pod not found"} |
422 | Invalid pod ID | {"error": "invalid pod id"} |
502 | Stop failed | {"status": "failed", "failed_at": "stop", "detail": "..."} |
502 | Resume failed | {"status": "failed", "failed_at": "resume", "detail": "..."} |
Server Pool Health
List Servers
Returns a paginated list of all inference servers across all providers with health status and consecutive success/failure counts.
Endpoint: GET /api/ai/v1/admin/servers
Method: GET
Parameters
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | int | 1 | Page number |
limit | int | 10 | Items per page (maximum 100) |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/servers' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"servers": [
{
"provider": "vllm",
"url": "https://abc123.proxy.runpod.net/v1",
"healthy": true,
"consecutive_failures": 0,
"consecutive_successes": 150,
"last_check_ms": 1719000000000,
"last_failure_ms": 0,
"models": ["threatwinds-qwen3-6-27b"]
}
],
"page": 1,
"limit": 10,
"total": 42
}
| Field | Type | Description |
|---|---|---|
provider | string | Provider name ("vllm", "sglang", or "speaches") |
url | string | Server endpoint URL |
healthy | bool | Current health status |
consecutive_failures | int | Number of consecutive failed health checks |
consecutive_successes | int | Number of consecutive successful health checks |
last_check_ms | int | Epoch milliseconds of last health check |
last_failure_ms | int | Epoch milliseconds of last failure (0 if none) |
models | string[] | Model IDs routed to this server |
Session Affinity
Get Affinity Metrics
Returns session affinity hit, miss, and fallback metrics per provider. Session affinity tracks how often requests are routed to the same pod as a previous request from the same user.
Endpoint: GET /api/ai/v1/admin/affinity
Method: GET
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/affinity' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"providers": [
{
"name": "vllm",
"session_hit_count": 15200,
"session_miss_count": 3100,
"fallback_count": 2,
"ttl_minutes": 30
},
{
"name": "sglang",
"session_hit_count": 8400,
"session_miss_count": 1900,
"fallback_count": 0,
"ttl_minutes": 30
}
]
}
| Field | Type | Description |
|---|---|---|
name | string | Provider name |
session_hit_count | int | Number of requests routed to an existing pod |
session_miss_count | int | Number of requests requiring new pod assignment |
fallback_count | int | Number of fallback routings |
ttl_minutes | int | Session affinity time-to-live in minutes |
Clear Affinity
Clears session affinity keys for a user, optionally scoped to a specific model. This forces the next request to be routed to a fresh pod.
Endpoint: POST /api/ai/v1/admin/affinity/clear
Method: POST
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
| Content-Type | string | Yes | application/json |
Request
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/admin/affinity/clear' \
-H 'accept: application/json' \
-H 'content-type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{"user_id": "usr_abc123", "model_id": null}'
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
user_id | string | Yes | User identifier |
model_id | string/null | No | Model to scope the clear to. If null or omitted, clears all models for the user. |
Response
Success Response (200 OK)
Clear all models for a user:
{
"user_id": "usr_abc123",
"cleared": 3
}
Clear a specific model:
{
"user_id": "usr_abc123",
"model_id": "threatwinds-qwen3-6-27b",
"cleared": 1
}
| Field | Type | Description |
|---|---|---|
user_id | string | The user whose affinity was cleared |
model_id | string (optional) | The model that was cleared (if scoped) |
cleared | int | Number of affinity entries removed |
Error Responses
| Status | Description | Response |
|---|---|---|
400 | Bad request | {"error": "invalid request body", "detail": "..."} |
503 | Service unavailable | {"error": "valkey unavailable"} |
Model Registry
Get Registry
Returns the list of backend server URLs registered for a specific model. At scale, a single model may route to thousands of pod URLs.
Endpoint: GET /api/ai/v1/admin/registry/{modelId}
Method: GET
Parameters
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
modelId | string | Yes | Model ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | int | 1 | Page number |
limit | int | 10 | Items per page (maximum 100) |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/registry/threatwinds-qwen3-6-27b?page=1&limit=50' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"model_id": "threatwinds-qwen3-6-27b",
"urls": [
"https://pod-1.proxy.runpod.net/v1",
"https://pod-2.proxy.runpod.net/v1"
],
"page": 1,
"limit": 50,
"total": 115
}
| Field | Type | Description |
|---|---|---|
model_id | string | The model ID queried |
urls | string[] | Backend server URLs registered for the model |
Empty Response (200 OK)
If the model has no registered servers:
{
"model_id": "threatwinds-qwen3-6-27b",
"urls": [],
"page": 1,
"limit": 10,
"total": 0
}
Error Responses
| Status | Description | Response |
|---|---|---|
503 | Service unavailable | {"error": "valkey error", "detail": "..."} |
Refresh Registry
Triggers a rebuild of server registry mappings for all providers. This reads the current server-to-model mappings and updates the in-memory routing tables without touching the persistent registry.
Endpoint: POST /api/ai/v1/admin/registry/refresh
Method: POST
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'POST' \
'https://apis.threatwinds.com/api/ai/v1/admin/registry/refresh' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"status": "triggered"
}
| Field | Type | Description |
|---|---|---|
status | string | Always "triggered" on success |
Error Responses
| Status | Description | Response |
|---|---|---|
500 | Internal error | {"error": "...", "detail": "..."} |
Demand Telemetry
Get Pod Demand
Returns demand telemetry for a specific pod, including current inflight request count, time since last use, and per-model breakdown with 5-minute request rates.
Endpoint: GET /api/ai/v1/admin/demand/{podUrl}
Method: GET
Parameters
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
podUrl | string | Yes | Pod URL (URL-encoded) |
Headers
| Header | Type | Required* | Description |
|---|---|---|---|
| Authorization | string | Optional* | Bearer token for session authentication |
| api-key | string | Optional* | API key for key-based authentication |
| api-secret | string | Optional* | API secret for key-based authentication |
Request
curl -X 'GET' \
'https://apis.threatwinds.com/api/ai/v1/admin/demand/https%3A%2F%2Fpod-1.proxy.runpod.net%2Fv1' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <token>'
Response
Success Response (200 OK)
{
"url": "https://pod-1.proxy.runpod.net/v1",
"inflight": 3,
"last_use_ms": 1719000000000,
"idle_seconds": 12,
"models": [
{
"model_id": "threatwinds-qwen3-6-27b",
"inflight": 2,
"rate_5m": 120
},
{
"model_id": "threatwinds-qwen3-6-35b-a3b",
"inflight": 1,
"rate_5m": 45
}
]
}
| Field | Type | Description |
|---|---|---|
url | string | The pod URL queried |
inflight | int | Current number of active requests |
last_use_ms | int | Epoch milliseconds of last request |
idle_seconds | int | Seconds since last request |
models | object[] | Per-model breakdown (see below) |
Per-model fields:
| Field | Type | Description |
|---|---|---|
model_id | string | Model identifier |
inflight | int | Active requests for this model |
rate_5m | int | Request count over the last 5 minutes |
Error Responses
| Status | Description | Response |
|---|---|---|
503 | Service unavailable | {"error": "valkey unavailable"} |
Error Handling
All admin endpoints return errors in a consistent format with standard HTTP status codes.
Error Response Format
{
"error": "Human-readable error message",
"detail": "Additional context (optional)"
}
Error Headers
All error responses include the following custom headers:
| Header | Description |
|---|---|
| x-error | Human-readable error message describing what went wrong |
| x-error-id | Unique identifier for error tracking and support |
Status Codes
| Status | Source | When |
|---|---|---|
200 | Success | Request completed successfully |
400 | Client | Invalid request body or query parameters |
401 | Auth | Missing or invalid authentication credentials |
403 | Auth | Insufficient permissions (missing ai_admin role) |
404 | Pod Management | Pod not found |
422 | Pod Management | Invalid pod ID format |
500 | Internal | Server-side error |
502 | External | External API error (e.g., pod stop/resume failure) |
503 | Service | Backend service unavailable (e.g., registry store down) |