Admin Endpoints

Note: These endpoints are restricted to users with the ai_admin role.

Overview

The AI API admin endpoints provide operational visibility and control over the inference serving fleet. They expose pod health, server pool status, session affinity metrics, model registry state, and demand telemetry — plus actions to clear affinity, restart pods, and refresh registry mappings.

All admin endpoints require the ai_admin gateway role. They are not rate-limited.

Authentication

Authentication follows the same pattern as all other AI API endpoints:

Authentication Method Description
Bearer Token Session-based authentication using Authorization: Bearer <token> header
API Key API key authentication using api-key and api-secret headers

Endpoints

Pod Management

List Pods

Returns a paginated list of all inference pods with health status, current request load, and cost information.

Endpoint: GET /api/ai/v1/admin/pods

Method: GET

Parameters

Query Parameters
Parameter Type Default Description
page int 1 Page number
limit int 10 Items per page (maximum 100)
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Note: You must use either Authorization header OR API key/secret combination.

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods?page=1&limit=20' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "pods": [
    {
      "id": "abc123",
      "name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
      "desired_status": "RUNNING",
      "kind": "vllm-qwen3-6-27b-fp8",
      "models": ["threatwinds-qwen3-6-27b"],
      "healthy": true,
      "inflight": 3,
      "last_use_ms": 1719000000000,
      "idle_seconds": 45,
      "last_status_change": "2026-06-25T10:00:00Z",
      "cost_per_hr": 0.44,
      "data_center_id": "us-central1",
      "location": "Iowa",
      "gpu_type_id": "a100-80gb"
    }
  ],
  "page": 1,
  "limit": 20,
  "total": 5
}
Field Type Description
id string Unique pod identifier
name string Human-readable pod name
desired_status string Desired lifecycle state (RUNNING, STOPPED, etc.)
kind string Pod kind / model configuration
models string[] Model IDs routed to this pod
healthy bool/null Current health status; null if pod not in any server pool
inflight int Active requests currently being processed
last_use_ms int Epoch milliseconds of last activity
idle_seconds int Seconds since last activity
last_status_change string ISO 8601 timestamp of last status change
cost_per_hr float Hourly cost in USD
data_center_id string Data center identifier
location string Geographic location
gpu_type_id string GPU type identifier

Get Pod Detail

Returns detailed information about a specific inference pod, including health, request load, and lifecycle metadata.

Endpoint: GET /api/ai/v1/admin/pods/{id}

Method: GET

Parameters

Path Parameters
Parameter Type Required Description
id string Yes Pod ID
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "id": "abc123",
  "name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
  "desired_status": "RUNNING",
  "kind": "vllm-qwen3-6-27b-fp8",
  "models": ["threatwinds-qwen3-6-27b"],
  "healthy": true,
  "inflight": 3,
  "last_use_ms": 1719000000000,
  "idle_seconds": 45,
  "last_started_at": "2026-06-25T08:00:00Z",
  "last_status_change": "2026-06-25T10:00:00Z",
  "cost_per_hr": 0.44,
  "data_center_id": "us-central1",
  "location": "Iowa",
  "gpu_type_id": "a100-80gb"
}
Field Type Description
last_started_at string ISO 8601 timestamp of last container start
Error Responses
Status Description Response
404 Pod not found {"error": "pod not found"}
502 External API error {"error": "runpod API error", "detail": "..."}

Restart Pod

Initiates a graceful restart cycle for a pod (stop then resume). This is a fire-and-forget operation — it returns immediately and the pod status updates asynchronously.

Endpoint: POST /api/ai/v1/admin/pods/{id}/restart

Method: POST

Parameters

Path Parameters
Parameter Type Required Description
id string Yes Pod ID
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123/restart' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "status": "initiated",
  "pod_id": "abc123"
}
Field Type Description
status string Operation status ("initiated" or "failed")
pod_id string The pod that was restarted
failed_at string (optional) If status is "failed", indicates which phase failed: "stop" or "resume"
Error Responses
Status Description Response
404 Pod not found {"error": "pod not found"}
422 Invalid pod ID {"error": "invalid pod id"}
502 Stop failed {"status": "failed", "failed_at": "stop", "detail": "..."}
502 Resume failed {"status": "failed", "failed_at": "resume", "detail": "..."}

Server Pool Health

List Servers

Returns a paginated list of all inference servers across all providers with health status and consecutive success/failure counts.

Endpoint: GET /api/ai/v1/admin/servers

Method: GET

Parameters

Query Parameters
Parameter Type Default Description
page int 1 Page number
limit int 10 Items per page (maximum 100)
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/servers' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "servers": [
    {
      "provider": "vllm",
      "url": "https://abc123.proxy.runpod.net/v1",
      "healthy": true,
      "consecutive_failures": 0,
      "consecutive_successes": 150,
      "last_check_ms": 1719000000000,
      "last_failure_ms": 0,
      "models": ["threatwinds-qwen3-6-27b"]
    }
  ],
  "page": 1,
  "limit": 10,
  "total": 42
}
Field Type Description
provider string Provider name ("vllm", "sglang", or "speaches")
url string Server endpoint URL
healthy bool Current health status
consecutive_failures int Number of consecutive failed health checks
consecutive_successes int Number of consecutive successful health checks
last_check_ms int Epoch milliseconds of last health check
last_failure_ms int Epoch milliseconds of last failure (0 if none)
models string[] Model IDs routed to this server

Session Affinity

Get Affinity Metrics

Returns session affinity hit, miss, and fallback metrics per provider. Session affinity tracks how often requests are routed to the same pod as a previous request from the same user.

Endpoint: GET /api/ai/v1/admin/affinity

Method: GET

Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/affinity' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "providers": [
    {
      "name": "vllm",
      "session_hit_count": 15200,
      "session_miss_count": 3100,
      "fallback_count": 2,
      "ttl_minutes": 30
    },
    {
      "name": "sglang",
      "session_hit_count": 8400,
      "session_miss_count": 1900,
      "fallback_count": 0,
      "ttl_minutes": 30
    }
  ]
}
Field Type Description
name string Provider name
session_hit_count int Number of requests routed to an existing pod
session_miss_count int Number of requests requiring new pod assignment
fallback_count int Number of fallback routings
ttl_minutes int Session affinity time-to-live in minutes

Clear Affinity

Clears session affinity keys for a user, optionally scoped to a specific model. This forces the next request to be routed to a fresh pod.

Endpoint: POST /api/ai/v1/admin/affinity/clear

Method: POST

Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication
Content-Type string Yes application/json

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/affinity/clear' \
  -H 'accept: application/json' \
  -H 'content-type: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"user_id": "usr_abc123", "model_id": null}'

Request Body

Field Type Required Description
user_id string Yes User identifier
model_id string/null No Model to scope the clear to. If null or omitted, clears all models for the user.

Response

Success Response (200 OK)

Clear all models for a user:

{
  "user_id": "usr_abc123",
  "cleared": 3
}

Clear a specific model:

{
  "user_id": "usr_abc123",
  "model_id": "threatwinds-qwen3-6-27b",
  "cleared": 1
}
Field Type Description
user_id string The user whose affinity was cleared
model_id string (optional) The model that was cleared (if scoped)
cleared int Number of affinity entries removed
Error Responses
Status Description Response
400 Bad request {"error": "invalid request body", "detail": "..."}
503 Service unavailable {"error": "valkey unavailable"}

Model Registry

Get Registry

Returns the list of backend server URLs registered for a specific model. At scale, a single model may route to thousands of pod URLs.

Endpoint: GET /api/ai/v1/admin/registry/{modelId}

Method: GET

Parameters

Path Parameters
Parameter Type Required Description
modelId string Yes Model ID
Query Parameters
Parameter Type Default Description
page int 1 Page number
limit int 10 Items per page (maximum 100)
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/registry/threatwinds-qwen3-6-27b?page=1&limit=50' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "model_id": "threatwinds-qwen3-6-27b",
  "urls": [
    "https://pod-1.proxy.runpod.net/v1",
    "https://pod-2.proxy.runpod.net/v1"
  ],
  "page": 1,
  "limit": 50,
  "total": 115
}
Field Type Description
model_id string The model ID queried
urls string[] Backend server URLs registered for the model
Empty Response (200 OK)

If the model has no registered servers:

{
  "model_id": "threatwinds-qwen3-6-27b",
  "urls": [],
  "page": 1,
  "limit": 10,
  "total": 0
}
Error Responses
Status Description Response
503 Service unavailable {"error": "valkey error", "detail": "..."}

Refresh Registry

Triggers a rebuild of server registry mappings for all providers. This reads the current server-to-model mappings and updates the in-memory routing tables without touching the persistent registry.

Endpoint: POST /api/ai/v1/admin/registry/refresh

Method: POST

Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/registry/refresh' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "status": "triggered"
}
Field Type Description
status string Always "triggered" on success
Error Responses
Status Description Response
500 Internal error {"error": "...", "detail": "..."}

Demand Telemetry

Get Pod Demand

Returns demand telemetry for a specific pod, including current inflight request count, time since last use, and per-model breakdown with 5-minute request rates.

Endpoint: GET /api/ai/v1/admin/demand/{podUrl}

Method: GET

Parameters

Path Parameters
Parameter Type Required Description
podUrl string Yes Pod URL (URL-encoded)
Headers
Header Type Required* Description
Authorization string Optional* Bearer token for session authentication
api-key string Optional* API key for key-based authentication
api-secret string Optional* API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/demand/https%3A%2F%2Fpod-1.proxy.runpod.net%2Fv1' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)
{
  "url": "https://pod-1.proxy.runpod.net/v1",
  "inflight": 3,
  "last_use_ms": 1719000000000,
  "idle_seconds": 12,
  "models": [
    {
      "model_id": "threatwinds-qwen3-6-27b",
      "inflight": 2,
      "rate_5m": 120
    },
    {
      "model_id": "threatwinds-qwen3-6-35b-a3b",
      "inflight": 1,
      "rate_5m": 45
    }
  ]
}
Field Type Description
url string The pod URL queried
inflight int Current number of active requests
last_use_ms int Epoch milliseconds of last request
idle_seconds int Seconds since last request
models object[] Per-model breakdown (see below)

Per-model fields:

Field Type Description
model_id string Model identifier
inflight int Active requests for this model
rate_5m int Request count over the last 5 minutes
Error Responses
Status Description Response
503 Service unavailable {"error": "valkey unavailable"}

Error Handling

All admin endpoints return errors in a consistent format with standard HTTP status codes.

Error Response Format

{
  "error": "Human-readable error message",
  "detail": "Additional context (optional)"
}

Error Headers

All error responses include the following custom headers:

Header Description
x-error Human-readable error message describing what went wrong
x-error-id Unique identifier for error tracking and support

Status Codes

Status Source When
200 Success Request completed successfully
400 Client Invalid request body or query parameters
401 Auth Missing or invalid authentication credentials
403 Auth Insufficient permissions (missing ai_admin role)
404 Pod Management Pod not found
422 Pod Management Invalid pod ID format
500 Internal Server-side error
502 External External API error (e.g., pod stop/resume failure)
503 Service Backend service unavailable (e.g., registry store down)