Admin API Reference¶
SMG provides administrative endpoints for managing tokenizers, workers, cache, and cluster operations.
Related Documentation
For health checks, worker status, and monitoring endpoints, see Gateway Extensions.
Tokenizer Management¶
Manage tokenizers for text processing and tokenization.
Authentication Required
These endpoints require admin authentication via API key or control plane credentials.
Add Tokenizer¶
Adds a new tokenizer from a local path or HuggingFace model ID.
Request Body:
{
"name": "llama3-tokenizer",
"source": "meta-llama/Meta-Llama-3-8B",
"chat_template_path": "/path/to/template.jinja"
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique tokenizer identifier |
source | string | Yes | HuggingFace model ID or local path |
chat_template_path | string | No | Path to custom Jinja2 chat template |
Response: 202 Accepted
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"message": "Tokenizer loading initiated"
}
List Tokenizers¶
Returns all registered tokenizers.
Response: 200 OK
{
"tokenizers": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "llama3-tokenizer",
"source": "meta-llama/Meta-Llama-3-8B",
"vocab_size": 128256
}
]
}
Get Tokenizer¶
Returns details for a specific tokenizer.
Response: 200 OK
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "llama3-tokenizer",
"source": "meta-llama/Meta-Llama-3-8B",
"vocab_size": 128256
}
Response: 404 Not Found
Get Tokenizer Status¶
Returns the loading status of a tokenizer.
Response: 200 OK
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"message": "Tokenizer loaded successfully",
"vocab_size": 128256
}
| Status | Description |
|---|---|
pending | Tokenizer loading queued |
processing | Tokenizer currently loading |
completed | Tokenizer ready for use |
failed | Loading failed (see message) |
Remove Tokenizer¶
Removes a tokenizer.
Response: 200 OK
Worker Management¶
Manage backend inference workers.
Tip
For listing workers and viewing metrics, see Gateway Extensions.
Create Worker¶
Registers a new backend worker.
Request Body:
{
"name": "gpu-worker-1",
"url": "http://gpu1:8000",
"model_name": "llama3-70b",
"api_key": "worker-secret-key"
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Worker identifier |
url | string | Yes | Worker base URL |
model_name | string | No | Model served by worker |
api_key | string | No | API key for worker auth |
Response: 201 Created
Update Worker¶
Updates worker configuration.
Request Body:
Response: 200 OK
Delete Worker¶
Removes a worker from the pool.
Response: 200 OK
Cache Management¶
Manage the routing cache and load information.
Flush Cache¶
Flushes the KV cache on all workers.
Response: 200 OK
Get Loads¶
Returns current load distribution across workers.
Response: 200 OK
{
"loads": [
{
"worker_id": "worker-1",
"url": "http://gpu1:8000",
"active_requests": 5,
"queue_depth": 2,
"cache_utilization": 0.75
}
]
}
Model Information¶
Query model and server information.
List Models¶
Returns available models (proxied to workers).
Response: 200 OK
{
"object": "list",
"data": [
{
"id": "llama3-70b",
"object": "model",
"created": 1700000000,
"owned_by": "meta"
}
]
}
Get Model Info¶
Returns detailed model information (proxied to workers).
Response: 200 OK
Get Server Info¶
Returns server information (proxied to workers).
Response: 200 OK
WASM Module Management¶
Manage WebAssembly plugins.
Add WASM Module¶
Uploads and registers a WASM module.
Request: Multipart form with WASM binary
Response: 201 Created
List WASM Modules¶
Returns all registered WASM modules.
Response: 200 OK
{
"modules": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "custom-filter",
"status": "loaded"
}
]
}
Remove WASM Module¶
Removes a WASM module.
Response: 200 OK
Error Responses¶
All endpoints return errors in a consistent format:
| HTTP Status | Error Type | Description |
|---|---|---|
400 | bad_request | Invalid request format or parameters |
401 | unauthorized | Missing or invalid authentication |
403 | forbidden | Insufficient permissions |
404 | not_found | Resource not found |
409 | conflict | Resource already exists |
503 | service_unavailable | No healthy workers available |
Authentication¶
Admin endpoints require authentication via one of:
- API Key: Pass via
Authorization: Bearer <api-key>header - Control Plane Key: For cluster management operations
Public endpoints (health checks, model info) do not require authentication.