TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads
TokenSpeed is a speed-of-light LLM inference engine designed from first principles for agentic workloads, with a compiler-backed modeling mechanism for parallelism, a high performance scheduler, a safe KV resource reuse restriction, a pluggable layered kernel system that supports heterogeneous accelerators, and SMG integration.
2026/05/06
