OpenAI-Compatible API Reference¶
SMG provides a fully OpenAI-compatible API, allowing you to use existing OpenAI client libraries with your self-hosted inference workers.
Base URL¶
Authentication¶
SMG supports optional API key authentication:
curl http://localhost:30000/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '...'
Enable authentication with --api-key:
Endpoints¶
Chat Completions¶
Create a chat completion.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
messages | array | Yes | Array of message objects |
max_tokens | integer | No | Maximum tokens to generate |
temperature | number | No | Sampling temperature (0-2) |
top_p | number | No | Nucleus sampling parameter |
n | integer | No | Number of completions to generate |
stream | boolean | No | Enable streaming responses |
stop | string/array | No | Stop sequences |
presence_penalty | number | No | Presence penalty (-2 to 2) |
frequency_penalty | number | No | Frequency penalty (-2 to 2) |
user | string | No | End-user identifier |
Message Object¶
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | system, user, or assistant |
content | string | Yes | Message content |
Example Request¶
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 100,
"temperature": 0.7
}'
Response¶
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705312345,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
Streaming Response¶
With "stream": true, responses are sent as Server-Sent Events:
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'
Response:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"}}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Completions¶
Create a text completion (legacy API).
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
prompt | string/array | Yes | Text prompt(s) |
max_tokens | integer | No | Maximum tokens to generate |
temperature | number | No | Sampling temperature (0-2) |
top_p | number | No | Nucleus sampling parameter |
n | integer | No | Number of completions |
stream | boolean | No | Enable streaming |
stop | string/array | No | Stop sequences |
echo | boolean | No | Echo prompt in response |
Example Request¶
curl http://localhost:30000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"prompt": "The quick brown fox",
"max_tokens": 50
}'
Response¶
{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1705312345,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"text": " jumps over the lazy dog.",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 7,
"total_tokens": 11
}
}
List Models¶
List available models.
Example Request¶
Response¶
{
"object": "list",
"data": [
{
"id": "meta-llama/Llama-3.1-8B-Instruct",
"object": "model",
"created": 1705312345,
"owned_by": "organization"
}
]
}
Retrieve Model¶
Get details about a specific model.
Example Request¶
Response¶
{
"id": "meta-llama/Llama-3.1-8B-Instruct",
"object": "model",
"created": 1705312345,
"owned_by": "organization"
}
Error Responses¶
Error Format¶
Error Codes¶
| HTTP Status | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request |
| 401 | authentication_error | Invalid or missing API key |
| 404 | not_found_error | Model or endpoint not found |
| 408 | timeout_error | Request timed out in queue |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | internal_error | Server error |
| 503 | service_unavailable | No healthy workers |
Example Error Response¶
{
"error": {
"message": "Rate limit exceeded. Please retry later.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
Client Libraries¶
Python (OpenAI SDK)¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="your-api-key" # or "not-needed" if auth disabled
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
JavaScript/TypeScript¶
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:30000/v1',
apiKey: 'your-api-key'
});
const response = await client.chat.completions.create({
model: 'meta-llama/Llama-3.1-8B-Instruct',
messages: [
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);
cURL¶
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Request Headers¶
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | Must be application/json |
Authorization | Conditional | Bearer {api-key} if auth enabled |
X-Request-ID | No | Custom request ID for tracing |
Rate Limiting¶
When rate limited, responses include:
| Header | Description |
|---|---|
Retry-After | Seconds to wait before retrying |
X-RateLimit-Limit | Request limit |
X-RateLimit-Remaining | Remaining requests |
X-RateLimit-Reset | Unix timestamp when limit resets |