Skip to content

Concepts

This section explains the core concepts behind Shepherd Model Gateway. Understanding these concepts helps you design, deploy, and operate SMG effectively.

What is SMG?

Shepherd Model Gateway is an inference gateway that adapts to your deployment:

With gRPC Workers With HTTP Workers With External APIs
Full OpenAI server Intelligent proxy Unified routing
Tokenization + caching Load balancing Model discovery
Tool parsing + MCP PD disaggregation Provider abstraction
Reasoning loops Health-aware failover API translation

Unlike generic load balancers, SMG understands LLM patterns: prefix caching, token streaming, and KV cache affinity.


Core Components

Core Components Architecture
Layer Purpose
API Layer Inference, utility, and admin endpoints
Router Manager Selects path based on worker type
Service Discovery Health monitoring, worker registration

Key Concepts

Architecture

How SMG's control plane and data plane work together.

Learn about Architecture →

Load Balancing

Routing strategies from simple random selection to cache-aware algorithms.

Learn about Routing →

Performance

Tokenizer caching and optimization strategies for high-throughput deployments.

Learn about Performance →

Extensibility

WASM plugins and MCP integration for custom middleware and external tools.

Learn about Extensibility →

Reliability

Circuit breakers, retries, and rate limiting for resilient deployments.

Learn about Reliability →


Concept Categories

Architecture

Understand how SMG is structured internally:

Routing

Learn how SMG selects workers for requests:

Performance

Optimize SMG for high-throughput deployments:

Extensibility

Extend SMG with custom logic and external tools:

Reliability

Understand how SMG handles failures:


Design Principles

SMG is built on several core principles:

  1. Transparency: SMG should be invisible to well-behaved applications. The same requests that work against a single worker should work through SMG.

  2. Performance: Routing decisions happen in microseconds. SMG never becomes the bottleneck.

  3. Reliability: Individual worker failures don't cause application failures. SMG routes around problems automatically.

  4. Observability: You can always understand what SMG is doing through metrics, traces, and logs.

  5. Simplicity: Common cases are simple. Advanced features are available but not required.