Admin Endpoints

Note: These endpoints are restricted to users with the ai_admin role.

Overview

The AI API admin endpoints provide operational visibility and control over the inference serving fleet. They expose pod health, server pool status, session affinity metrics, model registry state, and demand telemetry — plus actions to clear affinity, restart pods, and refresh registry mappings.

All admin endpoints require the ai_admin gateway role. They are not rate-limited.

Authentication

Authentication follows the same pattern as all other AI API endpoints:

Authentication Method	Description
Bearer Token	Session-based authentication using `Authorization: Bearer <token>` header
API Key	API key authentication using `api-key` and `api-secret` headers

Endpoints

Pod Management

List Pods

Returns a paginated list of all inference pods with health status, current request load, and cost information.

Endpoint: GET /api/ai/v1/admin/pods

Method: GET

Parameters

Query Parameters

Parameter	Type	Default	Description
`page`	int	`1`	Page number
`limit`	int	`10`	Items per page (maximum `100`)

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Note: You must use either Authorization header OR API key/secret combination.

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods?page=1&limit=20' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "pods": [
    {
      "id": "abc123",
      "name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
      "desired_status": "RUNNING",
      "kind": "vllm-qwen3-6-27b-fp8",
      "models": ["threatwinds-qwen3-6-27b"],
      "healthy": true,
      "inflight": 3,
      "last_use_ms": 1719000000000,
      "idle_seconds": 45,
      "last_status_change": "2026-06-25T10:00:00Z",
      "cost_per_hr": 0.44,
      "data_center_id": "us-central1",
      "location": "Iowa",
      "gpu_type_id": "a100-80gb"
    }
  ],
  "page": 1,
  "limit": 20,
  "total": 5
}

Field	Type	Description
`id`	string	Unique pod identifier
`name`	string	Human-readable pod name
`desired_status`	string	Desired lifecycle state (`RUNNING`, `STOPPED`, etc.)
`kind`	string	Pod kind / model configuration
`models`	string[]	Model IDs routed to this pod
`healthy`	bool/null	Current health status; `null` if pod not in any server pool
`inflight`	int	Active requests currently being processed
`last_use_ms`	int	Epoch milliseconds of last activity
`idle_seconds`	int	Seconds since last activity
`last_status_change`	string	ISO 8601 timestamp of last status change
`cost_per_hr`	float	Hourly cost in USD
`data_center_id`	string	Data center identifier
`location`	string	Geographic location
`gpu_type_id`	string	GPU type identifier

Get Pod Detail

Returns detailed information about a specific inference pod, including health, request load, and lifecycle metadata.

Endpoint: GET /api/ai/v1/admin/pods/{id}

Method: GET

Parameters

Path Parameters

Parameter	Type	Required	Description
`id`	string	Yes	Pod ID

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "id": "abc123",
  "name": "aiapi-dev-vllm-qwen3-6-27b-fp8-1",
  "desired_status": "RUNNING",
  "kind": "vllm-qwen3-6-27b-fp8",
  "models": ["threatwinds-qwen3-6-27b"],
  "healthy": true,
  "inflight": 3,
  "last_use_ms": 1719000000000,
  "idle_seconds": 45,
  "last_started_at": "2026-06-25T08:00:00Z",
  "last_status_change": "2026-06-25T10:00:00Z",
  "cost_per_hr": 0.44,
  "data_center_id": "us-central1",
  "location": "Iowa",
  "gpu_type_id": "a100-80gb"
}

Field	Type	Description
`last_started_at`	string	ISO 8601 timestamp of last container start

Error Responses

Status	Description	Response
`404`	Pod not found	`{"error": "pod not found"}`
`502`	External API error	`{"error": "runpod API error", "detail": "..."}`

Restart Pod

Initiates a graceful restart cycle for a pod (stop then resume). This is a fire-and-forget operation — it returns immediately and the pod status updates asynchronously.

Endpoint: POST /api/ai/v1/admin/pods/{id}/restart

Method: POST

Parameters

Path Parameters

Parameter	Type	Required	Description
`id`	string	Yes	Pod ID

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/pods/abc123/restart' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "status": "initiated",
  "pod_id": "abc123"
}

Field	Type	Description
`status`	string	Operation status (`"initiated"` or `"failed"`)
`pod_id`	string	The pod that was restarted
`failed_at`	string (optional)	If status is `"failed"`, indicates which phase failed: `"stop"` or `"resume"`

Error Responses

Status	Description	Response
`404`	Pod not found	`{"error": "pod not found"}`
`422`	Invalid pod ID	`{"error": "invalid pod id"}`
`502`	Stop failed	`{"status": "failed", "failed_at": "stop", "detail": "..."}`
`502`	Resume failed	`{"status": "failed", "failed_at": "resume", "detail": "..."}`

Server Pool Health

List Servers

Returns a paginated list of all inference servers across all providers with health status and consecutive success/failure counts.

Endpoint: GET /api/ai/v1/admin/servers

Method: GET

Parameters

Query Parameters

Parameter	Type	Default	Description
`page`	int	`1`	Page number
`limit`	int	`10`	Items per page (maximum `100`)

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/servers' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "servers": [
    {
      "provider": "vllm",
      "url": "https://abc123.proxy.runpod.net/v1",
      "healthy": true,
      "consecutive_failures": 0,
      "consecutive_successes": 150,
      "last_check_ms": 1719000000000,
      "last_failure_ms": 0,
      "models": ["threatwinds-qwen3-6-27b"]
    }
  ],
  "page": 1,
  "limit": 10,
  "total": 42
}

Field	Type	Description
`provider`	string	Provider name (`"vllm"`, `"sglang"`, or `"speaches"`)
`url`	string	Server endpoint URL
`healthy`	bool	Current health status
`consecutive_failures`	int	Number of consecutive failed health checks
`consecutive_successes`	int	Number of consecutive successful health checks
`last_check_ms`	int	Epoch milliseconds of last health check
`last_failure_ms`	int	Epoch milliseconds of last failure (0 if none)
`models`	string[]	Model IDs routed to this server

Session Affinity

Get Affinity Metrics

Returns session affinity hit, miss, and fallback metrics per provider. Session affinity tracks how often requests are routed to the same pod as a previous request from the same user.

Endpoint: GET /api/ai/v1/admin/affinity

Method: GET

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/affinity' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "providers": [
    {
      "name": "vllm",
      "session_hit_count": 15200,
      "session_miss_count": 3100,
      "fallback_count": 2,
      "ttl_minutes": 30
    },
    {
      "name": "sglang",
      "session_hit_count": 8400,
      "session_miss_count": 1900,
      "fallback_count": 0,
      "ttl_minutes": 30
    }
  ]
}

Field	Type	Description
`name`	string	Provider name
`session_hit_count`	int	Number of requests routed to an existing pod
`session_miss_count`	int	Number of requests requiring new pod assignment
`fallback_count`	int	Number of fallback routings
`ttl_minutes`	int	Session affinity time-to-live in minutes

Clear Affinity

Clears session affinity keys for a user, optionally scoped to a specific model. This forces the next request to be routed to a fresh pod.

Endpoint: POST /api/ai/v1/admin/affinity/clear

Method: POST

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication
Content-Type	string	Yes	`application/json`

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/affinity/clear' \
  -H 'accept: application/json' \
  -H 'content-type: application/json' \
  -H 'Authorization: Bearer <token>' \
  -d '{"user_id": "usr_abc123", "model_id": null}'

Request Body

Field	Type	Required	Description
`user_id`	string	Yes	User identifier
`model_id`	string/null	No	Model to scope the clear to. If `null` or omitted, clears all models for the user.

Response

Success Response (200 OK)

Clear all models for a user:

{
  "user_id": "usr_abc123",
  "cleared": 3
}

Clear a specific model:

{
  "user_id": "usr_abc123",
  "model_id": "threatwinds-qwen3-6-27b",
  "cleared": 1
}

Field	Type	Description
`user_id`	string	The user whose affinity was cleared
`model_id`	string (optional)	The model that was cleared (if scoped)
`cleared`	int	Number of affinity entries removed

Error Responses

Status	Description	Response
`400`	Bad request	`{"error": "invalid request body", "detail": "..."}`
`503`	Service unavailable	`{"error": "valkey unavailable"}`

Model Registry

Get Registry

Returns the list of backend server URLs registered for a specific model. At scale, a single model may route to thousands of pod URLs.

Endpoint: GET /api/ai/v1/admin/registry/{modelId}

Method: GET

Parameters

Path Parameters

Parameter	Type	Required	Description
`modelId`	string	Yes	Model ID

Query Parameters

Parameter	Type	Default	Description
`page`	int	`1`	Page number
`limit`	int	`10`	Items per page (maximum `100`)

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/registry/threatwinds-qwen3-6-27b?page=1&limit=50' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "model_id": "threatwinds-qwen3-6-27b",
  "urls": [
    "https://pod-1.proxy.runpod.net/v1",
    "https://pod-2.proxy.runpod.net/v1"
  ],
  "page": 1,
  "limit": 50,
  "total": 115
}

Field	Type	Description
`model_id`	string	The model ID queried
`urls`	string[]	Backend server URLs registered for the model

Empty Response (200 OK)

If the model has no registered servers:

{
  "model_id": "threatwinds-qwen3-6-27b",
  "urls": [],
  "page": 1,
  "limit": 10,
  "total": 0
}

Error Responses

Status	Description	Response
`503`	Service unavailable	`{"error": "valkey error", "detail": "..."}`

Refresh Registry

Triggers a rebuild of server registry mappings for all providers. This reads the current server-to-model mappings and updates the in-memory routing tables without touching the persistent registry.

Endpoint: POST /api/ai/v1/admin/registry/refresh

Method: POST

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'POST' \
  'https://apis.threatwinds.com/api/ai/v1/admin/registry/refresh' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "status": "triggered"
}

Field	Type	Description
`status`	string	Always `"triggered"` on success

Error Responses

Status	Description	Response
`500`	Internal error	`{"error": "...", "detail": "..."}`

Demand Telemetry

Get Pod Demand

Returns demand telemetry for a specific pod, including current inflight request count, time since last use, and per-model breakdown with 5-minute request rates.

Endpoint: GET /api/ai/v1/admin/demand/{podUrl}

Method: GET

Parameters

Path Parameters

Parameter	Type	Required	Description
`podUrl`	string	Yes	Pod URL (URL-encoded)

Headers

Header	Type	Required*	Description
Authorization	string	Optional*	Bearer token for session authentication
api-key	string	Optional*	API key for key-based authentication
api-secret	string	Optional*	API secret for key-based authentication

Request

curl -X 'GET' \
  'https://apis.threatwinds.com/api/ai/v1/admin/demand/https%3A%2F%2Fpod-1.proxy.runpod.net%2Fv1' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <token>'

Response

Success Response (200 OK)

{
  "url": "https://pod-1.proxy.runpod.net/v1",
  "inflight": 3,
  "last_use_ms": 1719000000000,
  "idle_seconds": 12,
  "models": [
    {
      "model_id": "threatwinds-qwen3-6-27b",
      "inflight": 2,
      "rate_5m": 120
    },
    {
      "model_id": "threatwinds-qwen3-6-35b-a3b",
      "inflight": 1,
      "rate_5m": 45
    }
  ]
}

Field	Type	Description
`url`	string	The pod URL queried
`inflight`	int	Current number of active requests
`last_use_ms`	int	Epoch milliseconds of last request
`idle_seconds`	int	Seconds since last request
`models`	object[]	Per-model breakdown (see below)

Per-model fields:

Field	Type	Description
`model_id`	string	Model identifier
`inflight`	int	Active requests for this model
`rate_5m`	int	Request count over the last 5 minutes

Error Responses

Status	Description	Response
`503`	Service unavailable	`{"error": "valkey unavailable"}`

Error Handling

All admin endpoints return errors in a consistent format with standard HTTP status codes.

Error Response Format

{
  "error": "Human-readable error message",
  "detail": "Additional context (optional)"
}

Error Headers

All error responses include the following custom headers:

Header	Description
x-error	Human-readable error message describing what went wrong
x-error-id	Unique identifier for error tracking and support

Status Codes

Status	Source	When
`200`	Success	Request completed successfully
`400`	Client	Invalid request body or query parameters
`401`	Auth	Missing or invalid authentication credentials
`403`	Auth	Insufficient permissions (missing `ai_admin` role)
`404`	Pod Management	Pod not found
`422`	Pod Management	Invalid pod ID format
`500`	Internal	Server-side error
`502`	External	External API error (e.g., pod stop/resume failure)
`503`	Service	Backend service unavailable (e.g., registry store down)