Launch First
Start with concrete commands, then tune the exact knobs that affect memory, scheduling, parallelism, and kernels.
Production-oriented docs for launching, tuning, and operating low-latency OpenAI-compatible serving.
tokenspeed serve openai/gpt-oss-20b \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1The server exposes an OpenAI-compatible API under /v1.
Large MoE deployments usually make the same decisions:
See Model Recipes for concrete examples and Server Parameters for the parameter reference.