Getting Started
This guide brings up a TokenSpeed development environment and verifies that the runtime can start.
Prerequisites
- NVIDIA GPU host
- Docker with GPU support
- enough shared memory for model serving
- access to the model checkpoints you plan to serve
Start a Runner Container
bash
docker pull lightseekorg/tokenspeed-runner:latest
docker run -itd \
--shm-size 32g \
--gpus all \
-v /raid/cache:/home/runner/.cache \
--ipc=host \
--network=host \
--pid=host \
--privileged \
--name tokenspeed \
lightseekorg/tokenspeed-runner:latest \
/bin/bashInside the container:
bash
git clone https://github.com/lightseekorg/tokenspeed.git
cd tokenspeedInstall Packages
Install the Python runtime:
bash
export PIP_BREAK_SYSTEM_PACKAGES=1
pip install -e "./python" --no-build-isolationInstall the kernel package. Its Python package metadata installs the CUDA kernel dependencies automatically.
bash
pip install -e tokenspeed-kernel/python/ --no-build-isolationInstall the scheduler package:
bash
pip install -e tokenspeed-scheduler/Verify
bash
tokenspeed env
tokenspeed serve --helpLaunch
bash
tokenspeed serve openai/gpt-oss-20b \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1For model-specific examples, continue with Model Recipes.