Responses API Reference¶
The Responses API provides an OpenAI-compatible interface for agentic workflows with built-in support for multi-turn conversations, tool execution, and MCP (Model Context Protocol) integration.
Overview¶
Purpose vs Chat Completions API¶
The Responses API differs from the Chat Completions API in several key ways:
| Feature | Chat Completions | Responses API |
|---|---|---|
| Conversation State | Stateless | Server-managed state |
| Tool Execution | Client-side | Server-side with MCP support |
| Multi-turn | Manual | Automatic with previous_response_id |
| Persistence | None | Built-in response/conversation storage |
| Agentic Workflows | Manual orchestration | Built-in tool loop execution |
Agentic Workflow Concepts¶
The Responses API enables agentic workflows where the model can:
- Reason about tasks using optional reasoning parameters
- Plan tool usage with automatic tool selection
- Execute tools via MCP servers or function calling
- Iterate through multiple tool calls in a single request
- Persist conversation history for multi-session workflows
Base URL¶
Create Response¶
Create a new response with optional tool execution and conversation management.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier |
input | string or array | Yes | Input text or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens to generate |
max_tool_calls | integer | No | Maximum number of tool calls per request |
temperature | number | No | Sampling temperature (0-2), default: 1.0 |
top_p | number | No | Nucleus sampling parameter (0-1), default: 1.0 |
stream | boolean | No | Enable streaming responses |
store | boolean | No | Store response for later retrieval, default: true |
tools | array | No | Available tools (function, mcp, web_search_preview, code_interpreter) |
tool_choice | string/object | No | Tool selection behavior: auto, none, required, or specific tool |
parallel_tool_calls | boolean | No | Allow parallel tool execution, default: true |
previous_response_id | string | No | Continue from a previous response |
conversation | string | No | Conversation ID (mutually exclusive with previous_response_id) |
reasoning | object | No | Reasoning configuration |
text | object | No | Text format for structured outputs |
metadata | object | No | Custom metadata (max 16 properties) |
user | string | No | End-user identifier |
background | boolean | No | Run request in background (not with streaming) |
Input Formats¶
Simple text input:
Structured input items:
{
"input": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
}
]
}
Tool Configuration¶
Function tools:
{
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
]
}
MCP tools:
{
"tools": [
{
"type": "mcp",
"server_url": "http://localhost:8080/mcp",
"server_label": "my-mcp-server",
"server_description": "My MCP server for data access",
"require_approval": "never",
"allowed_tools": ["query_database", "search_files"]
}
]
}
Reasoning Configuration¶
Effort levels: minimal, low, medium, high
Text Format (Structured Outputs)¶
{
"text": {
"format": {
"type": "json_schema",
"name": "user_info",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
}
},
"strict": true
}
}
}
Example Request¶
curl http://localhost:30000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"input": "Search for the latest news about AI",
"instructions": "Be concise and factual",
"max_output_tokens": 500,
"temperature": 0.7,
"tools": [
{
"type": "mcp",
"server_url": "http://localhost:8080/mcp",
"server_label": "search"
}
],
"tool_choice": "auto"
}'
Response¶
{
"id": "resp_abc123def456",
"object": "response",
"created_at": 1705312345,
"status": "completed",
"model": "meta-llama/Llama-3.1-8B-Instruct",
"output": [
{
"type": "mcp_list_tools",
"id": "mcp_list_001",
"server_label": "search",
"tools": [
{
"name": "web_search",
"description": "Search the web",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}}
}
]
},
{
"type": "mcp_call",
"id": "mcp_call_001",
"status": "completed",
"name": "web_search",
"arguments": "{\"query\": \"latest AI news\"}",
"output": "{\"results\": [...]}",
"server_label": "search"
},
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Based on my search, here are the latest AI developments..."
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 50,
"output_tokens": 150,
"total_tokens": 200
},
"tools": [
{
"type": "mcp",
"server_label": "search",
"server_url": "http://localhost:8080/mcp"
}
],
"tool_choice": "auto",
"parallel_tool_calls": true,
"store": true
}
Streaming Response¶
With "stream": true, responses are sent as Server-Sent Events:
curl http://localhost:30000/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"input": "Hello!",
"stream": true
}'
Event sequence:
event: response.created
data: {"type": "response.created", "sequence_number": 0, "response": {...}}
event: response.in_progress
data: {"type": "response.in_progress", "sequence_number": 1, "response": {...}}
event: response.output_item.added
data: {"type": "response.output_item.added", "sequence_number": 2, "output_index": 0, "item": {...}}
event: response.content_part.added
data: {"type": "response.content_part.added", "sequence_number": 3, "output_index": 0, "content_index": 0, "part": {...}}
event: response.output_text.delta
data: {"type": "response.output_text.delta", "sequence_number": 4, "output_index": 0, "content_index": 0, "delta": "Hello"}
event: response.output_text.done
data: {"type": "response.output_text.done", "sequence_number": 5, "output_index": 0, "content_index": 0, "text": "Hello! How can I help you?"}
event: response.output_item.done
data: {"type": "response.output_item.done", "sequence_number": 6, "output_index": 0, "item": {...}}
event: response.completed
data: {"type": "response.completed", "sequence_number": 7, "response": {...}}
data: [DONE]
MCP-specific streaming events:
event: response.mcp_list_tools.in_progress
data: {"type": "response.mcp_list_tools.in_progress", "output_index": 0, "item_id": "mcp_list_001"}
event: response.mcp_list_tools.completed
data: {"type": "response.mcp_list_tools.completed", "output_index": 0, "item_id": "mcp_list_001"}
event: response.mcp_call.in_progress
data: {"type": "response.mcp_call.in_progress", "output_index": 1, "item_id": "mcp_call_001"}
event: response.mcp_call_arguments.delta
data: {"type": "response.mcp_call_arguments.delta", "output_index": 1, "item_id": "mcp_call_001", "delta": "{\"query\": \"..."}
event: response.mcp_call_arguments.done
data: {"type": "response.mcp_call_arguments.done", "output_index": 1, "item_id": "mcp_call_001", "arguments": "{\"query\": \"...\"}"}
event: response.output_item.done
data: {"type": "response.output_item.done", "output_index": 1, "item": {"type": "mcp_call", "output": "...", ...}}
Get Response¶
Retrieve a previously stored response by ID.
Path Parameters¶
| Parameter | Type | Description |
|---|---|---|
response_id | string | The response ID (e.g., resp_abc123) |
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
include | array | Additional fields to include |
Example Request¶
Response¶
Returns the full response object as shown in the Create Response section.
Delete Response¶
Delete a stored response.
Path Parameters¶
| Parameter | Type | Description |
|---|---|---|
response_id | string | The response ID to delete |
Example Request¶
Response¶
List Response Input Items¶
List the input items that were sent with a response.
Path Parameters¶
| Parameter | Type | Description |
|---|---|---|
response_id | string | The response ID |
Example Request¶
Response¶
{
"object": "list",
"data": [
{
"id": "msg_input_001",
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
}
],
"first_id": "msg_input_001",
"last_id": "msg_input_001",
"has_more": false
}
Conversation Management¶
Conversations provide persistent storage for multi-turn interactions, enabling chat history to be maintained across multiple requests.
Create Conversation¶
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
metadata | object | No | Custom metadata (max 16 properties) |
Example Request¶
curl http://localhost:30000/v1/conversations \
-H "Content-Type: application/json" \
-d '{
"metadata": {
"project": "customer-support",
"user_id": "user_123"
}
}'
Response¶
{
"id": "conv_abc123def456",
"object": "conversation",
"created_at": 1705312345,
"metadata": {
"project": "customer-support",
"user_id": "user_123"
}
}
Get Conversation¶
Example Request¶
Response¶
{
"id": "conv_abc123def456",
"object": "conversation",
"created_at": 1705312345,
"metadata": {
"project": "customer-support"
}
}
Update Conversation¶
Update conversation metadata. Uses merge semantics - set a key to null to delete it.
Request Body¶
| Field | Type | Description |
|---|---|---|
metadata | object | Metadata to merge (null values delete keys) |
Example Request¶
curl http://localhost:30000/v1/conversations/conv_abc123def456 \
-H "Content-Type: application/json" \
-d '{
"metadata": {
"status": "resolved",
"project": null
}
}'
Response¶
Returns the updated conversation object.
Delete Conversation¶
Example Request¶
Response¶
List Conversation Items¶
Query Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 100 | Maximum items to return |
order | string | desc | Sort order: asc or desc |
after | string | - | Cursor for pagination |
Example Request¶
Response¶
{
"object": "list",
"data": [
{
"id": "item_001",
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello"}],
"status": "completed",
"created_at": 1705312345
},
{
"id": "item_002",
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Hi there!"}],
"status": "completed",
"created_at": 1705312346
}
],
"first_id": "item_001",
"last_id": "item_002",
"has_more": false
}
Create Conversation Items¶
Add items to a conversation. Maximum 20 items per request.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
items | array | Yes | Array of items to add (max 20) |
Supported Item Types¶
message- User or assistant messagesreasoning- Model reasoning contentmcp_list_tools- MCP tool listingmcp_call- MCP tool invocationitem_reference- Reference to an existing itemfunction_call- Function tool callfunction_call_output- Function call result
Example Request¶
curl http://localhost:30000/v1/conversations/conv_abc123/items \
-H "Content-Type: application/json" \
-d '{
"items": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "What is 2+2?"}]
},
{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "2+2 equals 4."}]
}
]
}'
Response¶
{
"object": "list",
"data": [
{
"id": "item_003",
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "What is 2+2?"}],
"status": "completed"
},
{
"id": "item_004",
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "2+2 equals 4."}],
"status": "completed"
}
],
"first_id": "item_003",
"last_id": "item_004",
"has_more": false
}
Get Conversation Item¶
Example Request¶
Response¶
Returns the item object.
Delete Conversation Item¶
Remove an item from a conversation. This performs a soft delete - the item may still exist if referenced by other conversations.
Example Request¶
Response¶
Returns the updated conversation object.
Examples¶
Simple Agentic Workflow¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="your-api-key"
)
# Create a response with MCP tools
response = client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="Search for the weather in San Francisco and summarize it",
tools=[
{
"type": "mcp",
"server_url": "http://localhost:8080/mcp",
"server_label": "weather-service"
}
],
tool_choice="auto"
)
# The response includes tool calls and final answer
for output in response.output:
if output.type == "mcp_call":
print(f"Tool called: {output.name}")
print(f"Result: {output.output}")
elif output.type == "message":
for content in output.content:
if content.type == "output_text":
print(f"Answer: {content.text}")
Multi-turn Conversation with Tools¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="your-api-key"
)
# Create a conversation
conversation = client.conversations.create(
metadata={"session": "support-123"}
)
# First turn
response1 = client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="I need help with my order #12345",
conversation=conversation.id,
tools=[
{
"type": "mcp",
"server_url": "http://localhost:8080/mcp",
"server_label": "order-service"
}
]
)
print(f"First response: {response1.id}")
# Second turn - continues the conversation
response2 = client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="Can you also check if there are any discounts available?",
conversation=conversation.id,
tools=[
{
"type": "mcp",
"server_url": "http://localhost:8080/mcp",
"server_label": "order-service"
}
]
)
print(f"Second response: {response2.id}")
# List conversation history
items = client.conversations.items.list(conversation.id)
for item in items.data:
print(f"{item.role}: {item.content}")
Streaming Response Handling¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:30000/v1",
api_key="your-api-key"
)
# Stream a response
with client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="Explain quantum computing",
stream=True
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
elif event.type == "response.mcp_call.in_progress":
print(f"\n[Calling tool: {event.item_id}]")
elif event.type == "response.completed":
print(f"\n\nTokens used: {event.response.usage.total_tokens}")
Using Previous Response ID¶
# Alternative to conversations - chain responses directly
response1 = client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="What are the main programming paradigms?",
store=True
)
# Continue from previous response
response2 = client.responses.create(
model="meta-llama/Llama-3.1-8B-Instruct",
input="Can you elaborate on functional programming?",
previous_response_id=response1.id,
store=True
)
Error Responses¶
Error Format¶
{
"error": {
"message": "Error description",
"type": "error_type",
"param": "field_name",
"code": "error_code"
}
}
Common Errors¶
| HTTP Status | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request or validation failure |
| 401 | authentication_error | Invalid or missing API key |
| 404 | not_found_error | Response, conversation, or item not found |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | internal_error | Server error |
| 503 | service_unavailable | No healthy workers available |
Validation Errors¶
{
"error": {
"message": "Invalid 'conversation': 'invalid-id'. Expected an ID that begins with 'conv_'.",
"type": "invalid_request_error",
"param": "conversation",
"code": "invalid_conversation_id"
}
}
{
"error": {
"message": "Mutually exclusive parameters. Ensure you are only providing one of: 'previous_response_id' or 'conversation'.",
"type": "invalid_request_error",
"code": "mutually_exclusive_parameters"
}
}
SGLang Extensions¶
The Responses API includes additional sampling parameters specific to SGLang:
| Field | Type | Default | Description |
|---|---|---|---|
top_k | integer | -1 | Top-k sampling (-1 = disabled) |
min_p | number | 0.0 | Min-p sampling threshold |
repetition_penalty | number | 1.0 | Repetition penalty (1.0 = disabled) |
frequency_penalty | number | - | OpenAI-compatible frequency penalty |
presence_penalty | number | - | OpenAI-compatible presence penalty |
stop | string/array | - | Stop sequences |
Example: