TorchSpec: Speculative Decoding Training at Scale
We’re excited to announce TorchSpec, a torch-native framework for scalable speculative decoding training. TorchSpec streams hidden states directly from inference engines to training workers via Mooncake, eliminating the need to materialize massive tensors on disk or co-locate training with the target model. This design enables fully disaggregated pipelines where inference and training scale independently.
2026/03/17
