Configuration Reference Complete configuration reference for tuning SMG behavior.
Configuration Methods SMG can be configured through:
Command-line arguments (highest priority) Environment variables Default values (lowest priority) Worker Configuration Host Network interface to bind to.
Option --host Environment - Default 0.0.0.0
Value Description 127.0.0.1 Localhost only 0.0.0.0 All IPv4 interfaces :: All IPv6 interfaces ::1 IPv6 localhost
Port Port for the main API server.
Option --port Environment - Default 30000
Worker URLs List of worker URLs to route requests to.
Option --worker-urls Environment - Default Empty Format Space-separated URLs
Examples :
--worker-urls http://worker1:8000 http://worker2:8000
--worker-urls http://[ ::1] :8000 http://192.168.1.1:8000 # IPv6 and IPv4
--worker-urls grpc://worker1:50051 # gRPC mode
Routing Policy Configuration Load Balancing Policy Controls how requests are distributed across workers.
Option --policy Environment - Default cache_aware Values random, round_robin, cache_aware, power_of_two, prefix_hash, consistent_hashing, bucket, manual
Policy Comparison :
Policy Use Case KV Cache Load Balance random Simple deployments Poor Fair round_robin Uniform workloads Poor Good power_of_two Variable workloads Poor Excellent cache_aware LLM inference Excellent Good prefix_hash Consistent routing by prefix Good Good consistent_hashing Session affinity via hash ring Good Good bucket Load balancing with bucket boundaries Poor Excellent manual Sticky sessions with LRU eviction Good Manual
Recommendation : Use cache_aware for LLM workloads to maximize KV cache hit rates.
Cache-Aware Policy Options Option Description Default --cache-threshold Cache threshold (0.0-1.0) for cache-aware routing 0.3 --balance-abs-threshold Absolute threshold for load balancing trigger 64 --balance-rel-threshold Relative threshold for load balancing trigger 1.5 --eviction-interval Interval in seconds between cache eviction operations 120 --max-tree-size Maximum size of the approximation tree 67108864 --block-size KV cache block size for event-driven cache-aware routing 16
Prefix Hash Policy Options Option Description Default --prefix-token-count Number of prefix tokens to use for hashing 256 --prefix-hash-load-factor Load factor threshold for rebalancing 1.25
Manual Policy Options Option Description Default --max-idle-secs Maximum idle time before eviction 14400 (4 hours) --assignment-mode Mode for new routing key assignment random
Assignment Modes : - random - Assign to a random worker - min_load - Assign to worker with fewest active requests - min_group - Assign to worker with fewest routing keys
Advanced Routing Options Option Description Default --dp-aware Enable data parallelism aware scheduling false --enable-igw Enable IGW (Inference Gateway) mode for multi-model support false --dp-minimum-tokens-scheduler Enable minimum tokens scheduler for data parallel group false --load-monitor-interval Interval in seconds between load monitor checks for PowerOfTwo routing 10
PD Disaggregation Configuration Prefill-Decode disaggregated mode separates prefill and decode operations across different workers.
Enable PD Mode Option --pd-disaggregation Environment - Default false
Prefill Servers Option --prefill Format URL [BOOTSTRAP_PORT] Multiple Yes (specify multiple times)
Examples :
--prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--prefill http://prefill3:30003 none # No bootstrap port
Decode Servers Option --decode Format URL Multiple Yes (specify multiple times)
Example :
--decode http://decode1:30003 \
--decode http://decode2:30004
PD-Specific Policies Option Description Default --prefill-policy Specific policy for prefill nodes Uses main --policy --decode-policy Specific policy for decode nodes Uses main --policy
Worker Startup Configuration Option Description Default --worker-startup-timeout-secs Timeout for worker startup and registration 1800 (30 min) --worker-startup-check-interval Interval between worker startup checks 30
Service Discovery (Kubernetes) Enable Service Discovery Option --service-discovery Environment - Default false
Note: Enabling service discovery automatically enables IGW mode.
Label Selector Option --selector Format key=value (space-separated for multiple)
Example :
--selector app = sglang-worker tier = inference
Namespace Option --service-discovery-namespace Environment - Default All namespaces
Worker Port Option --service-discovery-port Environment - Default 80
PD Service Discovery Selectors Option Description --prefill-selector Label selector for prefill server pods --decode-selector Label selector for decode server pods
HA Mesh Router Discovery Option Description --router-selector Label selector for router pod discovery in HA mesh mode (format: key=value)
Per-Worker Model ID Override Option Description --model-id-from Override each worker's model_id from pod metadata. Accepted values: namespace, label:<key>, or annotation:<key>.
Tokenizer Configuration Model Path Option --model-path Environment - Default None Description HuggingFace model ID or local path for loading tokenizer
Tokenizer Path Option --tokenizer-path Environment - Default None Description Explicit tokenizer path (overrides model_path tokenizer)
Chat Template Option --chat-template Environment - Default None Description Path to chat template file
Disable Tokenizer Autoload Option --disable-tokenizer-autoload Environment - Default false Description Disable automatic tokenizer loading at startup and during worker registration. Useful when tokenizers are loaded on-demand via the API.
Tokenizer Cache (L0 - Exact Match) Option Description Default --tokenizer-cache-enable-l0 Enable L0 exact match cache false --tokenizer-cache-l0-max-entries Maximum entries in L0 cache 10000
Tokenizer Cache (L1 - Prefix Matching) Option Description Default --tokenizer-cache-enable-l1 Enable L1 prefix matching cache false --tokenizer-cache-l1-max-memory Maximum memory for L1 cache (bytes) 52428800 (50MB)
Parser Configuration Reasoning Parser Option --reasoning-parser Environment - Default None Values deepseek-r1, qwen3, etc. Description Parser for reasoning models with thinking tokens
Option --tool-call-parser Environment - Default None Values json, qwen, etc. Description Parser for tool-call/function-calling interactions
MCP Configuration MCP Config Path Option --mcp-config-path Environment - Default None Description Path to MCP (Model Context Protocol) server configuration file
Backend Configuration Backend Runtime Option --backend Environment - Default None (auto-detected) Values sglang, vllm, trtllm, openai, anthropic, gemini
History Backend Option --history-backend Environment - Default memory Values memory, none, oracle, postgres, redis
Storage Configuration Oracle Database Option Environment Description --oracle-wallet-path ATP_WALLET_PATH Path to Oracle ATP wallet directory --oracle-tns-alias ATP_TNS_ALIAS Oracle TNS alias from tnsnames.ora --oracle-dsn ATP_DSN Oracle connection descriptor/DSN --oracle-user ATP_USER Oracle database username --oracle-password ATP_PASSWORD Oracle database password --oracle-external-auth ATP_EXTERNAL_AUTH Enable Oracle external authentication (default: false) --oracle-pool-min ATP_POOL_MIN Minimum connection pool size (default: 1) --oracle-pool-max ATP_POOL_MAX Maximum connection pool size (default: 16) --oracle-pool-timeout-secs ATP_POOL_TIMEOUT_SECS Pool timeout in seconds (default: 30)
PostgreSQL Database Option Environment Description Default --postgres-db-url POSTGRES_DB_URL PostgreSQL connection URL - --postgres-pool-max-size POSTGRES_POOL_MAX Maximum pool size 16
Redis Database Option Environment Description Default --redis-url REDIS_URL Redis connection URL - --redis-pool-max-size REDIS_POOL_MAX Maximum pool size 16 --redis-retention-days REDIS_RETENTION_DAYS Data retention (-1 for persistent) 30
WASM Configuration Enable WebAssembly Option --enable-wasm Environment - Default false Description Enable WebAssembly support
Storage Hook WASM Component Option --storage-hook-wasm-path Environment - Default None Description Path to a WASM component implementing storage hooks. When set, wraps all storage backends with hook-based interceptors.
Schema Config File Option --schema-config Environment - Default None Description Path to a YAML schema config file for storage table/column remapping.
WebRTC Configuration Option Description Default --webrtc-bind-addr Bind address for WebRTC UDP sockets (client-facing ICE candidate IP). Set to 127.0.0.1 for local development on the same machine. 0.0.0.0 (auto-detect via routing table) --webrtc-stun-server STUN server for ICE candidate gathering (host:port). Set to your own STUN server for enterprise deployments that restrict outbound traffic to external STUN servers. stun.l.google.com:19302
Mesh Server Configuration High-availability mesh networking for multi-router coordination.
Option Description Default --enable-mesh Enable mesh server for HA multi-router coordination. Requires at least two SMG instances. false --mesh-server-name Name for this mesh node. If not set, a random name is generated (e.g., Mesh_a1b2). Auto-generated --mesh-host Bind address for the mesh server. 0.0.0.0 --mesh-advertise-host Routable address advertised to other mesh peers. Required when --mesh-host is an unspecified bind address such as 0.0.0.0. --mesh-host --mesh-port Port for the mesh server. 39527 --mesh-peer-urls Peer mesh node addresses to join (format: host:port). Used for initial cluster formation. (none)
Example :
smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192 .168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192 .168.1.10:39527
Request Handling Configuration Request Timeout Option --request-timeout-secs Environment - Default 1800 (30 minutes) Description Maximum time for request processing
Shutdown Grace Period Option --shutdown-grace-period-secs Environment - Default 180 (3 minutes) Description Time to wait for in-flight requests during shutdown
Maximum Payload Size Option --max-payload-size Environment - Default 536870912 (512MB) Description Maximum request payload size in bytes
CORS Configuration Option --cors-allowed-origins Environment - Default Empty Format Space-separated URLs
Example :
--cors-allowed-origins http://localhost:3000 https://example.com
Option --request-id-headers Environment - Default None (uses common defaults) Description Custom HTTP headers to check for request IDs
Example :
--request-id-headers x-request-id x-trace-id x-correlation-id
Option --storage-context-headers Environment - Default Empty Format Space-separated header=context_key entries Description Maps request headers into storage hook request context
Example :
--storage-context-headers x-tenant-id= tenant_id x-user-id= user_id
This lets storage hooks read values such as tenant_id and user_id from the request context without hard-coding specific headers in the gateway.
Only map headers that are injected or sanitized by a trusted upstream. Client-supplied headers can otherwise spoof storage hook request context values.
Rate Limiting Configuration Concurrent Request Limit Option --max-concurrent-requests Environment - Default -1 (unlimited) Range -1 or 1+
Sizing Guide :
max_concurrent_requests = num_workers * requests_per_worker_capacity
Worker GPU Memory Suggested per Worker 16GB 4-8 40GB 8-16 80GB 16-32
Queue Configuration Option Description Default --queue-size Maximum requests waiting when rate limit reached 100 --queue-timeout-secs Maximum time a request can wait in queue 60
Token Bucket Rate Limiting Option --rate-limit-tokens-per-second Environment - Default Same as max-concurrent-requests Description Token bucket refill rate
Retry Configuration Retry Options Option Description Default --retry-max-retries Maximum retry attempts 5 --retry-initial-backoff-ms Initial backoff delay (ms) 50 --retry-max-backoff-ms Maximum backoff delay (ms) 30000 --retry-backoff-multiplier Exponential backoff multiplier 1.5 --retry-jitter-factor Jitter factor (0.0-1.0) 0.2 --disable-retries Disable automatic retries false
Backoff Formula :
delay = min(initial_backoff * multiplier^attempt, max_backoff) * (1 + random(0, jitter_factor))
Circuit Breaker Configuration Option Description Default --cb-failure-threshold Failures before circuit opens 10 --cb-success-threshold Successes needed to close in half-open state 3 --cb-timeout-duration-secs Time before attempting recovery 60 --cb-window-duration-secs Sliding window for tracking failures 120 --disable-circuit-breaker Disable circuit breaker false
Circuit Breaker States : - Closed : Normal operation, tracking failures - Open : All requests fail fast, circuit tripped - Half-Open : Testing if service recovered
Health Check Configuration Option Description Default --health-failure-threshold Failures before marking unhealthy 3 --health-success-threshold Successes before marking healthy 2 --health-check-timeout-secs Timeout for health check requests 5 --health-check-interval-secs Interval between health checks 60 --health-check-endpoint Health check endpoint path /health --disable-health-check Disable all health checks false --remove-unhealthy-workers Remove workers from the registry when marked unhealthy by health checks. Useful for ephemeral worker pools where failed workers should be deregistered. false
Prometheus Metrics Configuration Metrics Server Option Description Default --prometheus-port Port for Prometheus metrics endpoint 29000 --prometheus-host Host for Prometheus metrics server 0.0.0.0 --prometheus-duration-buckets Custom histogram buckets Default buckets
Example :
--prometheus-duration-buckets 0 .001 0 .005 0 .01 0 .025 0 .05 0 .1 0 .25 0 .5 1 .0 2 .5 5 .0 10 .0
OpenTelemetry Configuration Enable Tracing Option --enable-trace Environment - Default false
OTLP Endpoint Option --otlp-traces-endpoint Environment - Default localhost:4317 Format host:port
Example :
smg --enable-trace --otlp-traces-endpoint jaeger:4317
TLS/mTLS Security Configuration Server TLS For HTTPS on the gateway:
Option Description --tls-cert-path Path to server certificate (PEM format) --tls-key-path Path to server private key (PEM format)
Client mTLS For secure communication to workers (Python bindings):
Option Description --client-cert-path Path to client certificate --client-key-path Path to client private key --ca-cert-paths Path(s) to CA certificate(s)
Control Plane Authentication API Key (Worker Authorization) Option --api-key Environment - Default None Description API key for worker authorization (useful with dp-aware scheduling)
Control Plane API Keys Option --control-plane-api-keys Environment CONTROL_PLANE_API_KEYS Format id:name:role:key Multiple Yes
Example :
--control-plane-api-keys 'key1:Admin:admin:secret123' 'key2:ReadOnly:user:secret456'
JWT/OIDC Authentication Option Environment Description --jwt-issuer JWT_ISSUER OIDC issuer URL --jwt-audience JWT_AUDIENCE Expected audience claim --jwt-jwks-uri JWT_JWKS_URI Explicit JWKS URI (auto-discovered if not set) --jwt-role-claim - JWT claim containing role (default: roles) --jwt-role-mapping - Role mapping from IDP to gateway role
JWT Role Mapping Example :
--jwt-role-mapping 'Gateway.Admin=admin' 'Gateway.User=user'
Audit Logging Option --disable-audit-logging Environment - Default false (audit logging enabled)
Logging Configuration Log Level Option --log-level Environment RUST_LOG Default info Values debug, info, warn, error
Per-Module Logging :
RUST_LOG = smg = debug,hyper= warn smg ...
Log Directory Option --log-dir Environment - Default None (console only) Description Directory to store log files
JSON Logs Option --log-json Environment - Default false Description Output logs as JSON (structured). Defaults to human-readable text logs.
Configuration Examples Minimal Configuration smg --worker-urls http://localhost:8000
High-Throughput Configuration smg \
--worker-urls http://w1:8000 http://w2:8000 http://w3:8000 http://w4:8000 \
--policy cache_aware \
--max-concurrent-requests 200 \
--queue-size 400 \
--queue-timeout-secs 60 \
--retry-max-retries 3
Low-Latency Configuration smg \
--worker-urls http://w1:8000 http://w2:8000 \
--policy power_of_two \
--max-concurrent-requests 50 \
--queue-size 25 \
--queue-timeout-secs 5 \
--health-check-interval-secs 5 \
--request-timeout-secs 30
PD Disaggregated Mode smg \
--pd-disaggregation \
--prefill http://prefill1:30001 9001 \
--prefill http://prefill2:30002 9002 \
--decode http://decode1:30003 \
--decode http://decode2:30004 \
--prefill-policy cache_aware \
--decode-policy round_robin
Kubernetes Service Discovery smg \
--service-discovery \
--selector app = sglang-worker \
--service-discovery-namespace inference \
--service-discovery-port 8000 \
--policy cache_aware
High-Availability Mesh # Router 1
smg \
--enable-mesh \
--mesh-server-name router-1 \
--mesh-advertise-host 192 .168.1.10 \
--mesh-port 39527 \
--mesh-peer-urls 192 .168.1.11:39527 \
--worker-urls http://worker1:8000
# Router 2
smg \
--enable-mesh \
--mesh-server-name router-2 \
--mesh-advertise-host 192 .168.1.11 \
--mesh-port 39527 \
--mesh-peer-urls 192 .168.1.10:39527 \
--worker-urls http://worker2:8000
Secure Production Configuration smg \
--service-discovery \
--selector app = sglang-worker \
--service-discovery-namespace inference \
--policy cache_aware \
--max-concurrent-requests 100 \
--tls-cert-path /etc/certs/server.crt \
--tls-key-path /etc/certs/server.key \
--jwt-issuer https://login.microsoftonline.com/tenant/v2.0 \
--jwt-audience api://smg-gateway \
--jwt-role-mapping 'Gateway.Admin=admin' 'Gateway.User=user' \
--enable-trace \
--otlp-traces-endpoint jaeger:4317 \
--host 0 .0.0.0 \
--port 443
With Tokenizer and Parsers smg \
--worker-urls http://localhost:8000 \
--model-path meta-llama/Llama-3-8B-Instruct \
--tokenizer-cache-enable-l0 \
--tokenizer-cache-l0-max-entries 50000 \
--reasoning-parser deepseek-r1 \
--tool-call-parser json
With Database Backend # PostgreSQL
smg \
--worker-urls http://localhost:8000 \
--history-backend postgres \
--postgres-db-url "postgres://user:pass@localhost:5432/smg" \
--postgres-pool-max-size 32
# Redis
smg \
--worker-urls http://localhost:8000 \
--history-backend redis \
--redis-url "redis://localhost:6379" \
--redis-pool-max-size 32 \
--redis-retention-days 7
Environment Variable Reference Environment Variable CLI Option Description RUST_LOG --log-level Log level ATP_WALLET_PATH --oracle-wallet-path Oracle wallet path ATP_TNS_ALIAS --oracle-tns-alias Oracle TNS alias ATP_DSN --oracle-dsn Oracle DSN ATP_USER --oracle-user Oracle username ATP_PASSWORD --oracle-password Oracle password ATP_EXTERNAL_AUTH --oracle-external-auth Enable Oracle external authentication ATP_POOL_MIN --oracle-pool-min Oracle min pool size ATP_POOL_MAX --oracle-pool-max Oracle max pool size ATP_POOL_TIMEOUT_SECS --oracle-pool-timeout-secs Oracle pool timeout POSTGRES_DB_URL --postgres-db-url PostgreSQL URL POSTGRES_POOL_MAX --postgres-pool-max-size PostgreSQL max pool REDIS_URL --redis-url Redis URL REDIS_POOL_MAX --redis-pool-max-size Redis max pool REDIS_RETENTION_DAYS --redis-retention-days Redis retention JWT_ISSUER --jwt-issuer JWT issuer URL JWT_AUDIENCE --jwt-audience JWT audience JWT_JWKS_URI --jwt-jwks-uri JWKS URI CONTROL_PLANE_API_KEYS --control-plane-api-keys Control plane API keys