Shepherd Model Gateway¶
The high-performance inference gateway for production LLM deployments
Route, balance, and orchestrate traffic across your LLM fleet with enterprise-grade reliability.
Why Shepherd Model Gateway?¶
SMG sits between your applications and LLM workers, providing a unified control and data plane for managing inference at scale. Whether you're running a single model or orchestrating hundreds of workers across multiple clusters, SMG gives you the tools to do it reliably.
Full OpenAI Server Mode¶
With gRPC workers, SMG becomes a complete OpenAI-compatible server — handling tokenization, chat templates, tool calling, MCP, reasoning loops, and detokenization at the gateway level.
High Performance¶
Native Rust implementation with gateway-side tokenization caching, token-level streaming, and sub-millisecond routing. Built for throughput at scale.
Enterprise Reliability¶
Circuit breakers, automatic retries with exponential backoff, rate limiting, and health monitoring. Your inference stack stays up.
Full Observability¶
40+ Prometheus metrics, OpenTelemetry distributed tracing, and structured logging. Know exactly what's happening.
How It Works¶
gRPC Mode¶
Gateway = Full Server
SMG handles everything: tokenization, chat templates, tool parsing, MCP loops, detokenization, and PD routing. Workers run raw inference on SGLang, vLLM, or TensorRT-LLM.
HTTP Mode¶
Gateway = Smart Proxy
SMG handles routing, load balancing, and failover. Workers run full OpenAI-compatible servers (SGLang, vLLM, TRT-LLM). Supports PD disaggregation.
External Mode¶
Gateway = Unified Router
Route to OpenAI, Claude, Gemini through one endpoint. Mix self-hosted and cloud models seamlessly.
Choose Your Path¶
New to SMG?¶
Start here to understand what SMG does and get it running in minutes.
Learn the Concepts¶
Understand SMG's architecture, routing strategies, and reliability features.
Ops Setup¶
Continue onboarding with monitoring, logging, and TLS guides.
API Reference¶
Complete reference for the OpenAI-compatible API and SMG extensions.