Service Discovery¶

SMG automatically discovers and registers workers in Kubernetes environments, eliminating manual worker URL management and enabling dynamic scaling.

Overview¶

Native Kubernetes¶

Watch pods matching label selectors with automatic registration and removal.

Dynamic Scaling¶

Workers are automatically added and removed as pods scale up or down.

Label Selectors¶

Target specific workers using Kubernetes label selectors.

PD Support¶

Separate discovery for prefill and decode workers in disaggregated deployments.

How It Works¶

Service Discovery Architecture

Discovery Flow¶

Watch Pods: SMG creates a Kubernetes watcher for pods matching the configured label selector
Filter Events: Only pods matching the selector (regular or PD mode) are processed
Handle Events: Pod creation triggers AddWorker job, deletion triggers RemoveWorker job
Register Workers: Workers are added to the registry with health checks starting immediately
Track State: A HashSet tracks discovered pods to prevent duplicate registrations

Configuration¶

Basic Setup¶

smg \
  --service-discovery \
  --selector app=sglang-worker \
  --service-discovery-namespace inference \
  --service-discovery-port 8000

Parameters¶

Parameter	Default	Description
`--service-discovery`	`false`	Enable Kubernetes service discovery
`--selector`	-	Label selector for worker pods (required)
`--service-discovery-namespace`	(all namespaces)	Kubernetes namespace to watch
`--service-discovery-port`	`80`	Port to use for worker connections

Label Selectors¶

SMG uses Kubernetes label selectors to identify worker pods.

Simple Selector¶

Match pods with a single label:

smg --service-discovery --selector app=vllm

Matches pods with label app=vllm.

Multiple Labels¶

Match pods that carry several labels by passing multiple key=value pairs:

smg --service-discovery --selector app=sglang environment=production

Matches pods with both app=sglang AND environment=production.

PD Disaggregation Discovery¶

For prefill-decode disaggregated deployments, use separate selectors for each worker type.

Configuration¶

smg \
  --service-discovery \
  --pd-disaggregation \
  --prefill-selector app=sglang role=prefill \
  --decode-selector app=sglang role=decode \
  --service-discovery-namespace inference

Parameters¶

Parameter	Description
`--prefill-selector`	Label selector for prefill workers
`--decode-selector`	Label selector for decode workers

Worker Labels¶

Label your pods appropriately:

# Prefill worker
apiVersion: v1
kind: Pod
metadata:
  name: sglang-prefill-0
  labels:
    app: sglang
    role: prefill
spec:
  containers:
    - name: sglang
      image: lmsysorg/sglang:latest
      args: ["--dp-size", "1", "--prefill-only"]

---
# Decode worker
apiVersion: v1
kind: Pod
metadata:
  name: sglang-decode-0
  labels:
    app: sglang
    role: decode
spec:
  containers:
    - name: sglang
      image: lmsysorg/sglang:latest
      args: ["--dp-size", "1", "--decode-only"]

Required RBAC¶

SMG needs permissions to watch pods in the target namespace.

Role¶

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: smg-discovery
  namespace: inference
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]

RoleBinding¶

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: smg-discovery
  namespace: inference
subjects:
  - kind: ServiceAccount
    name: smg
    namespace: inference
roleRef:
  kind: Role
  name: smg-discovery
  apiGroup: rbac.authorization.k8s.io

ServiceAccount¶

apiVersion: v1
kind: ServiceAccount
metadata:
  name: smg
  namespace: inference

Cross-Namespace Discovery¶

To discover workers across multiple namespaces, use a ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: smg-discovery
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: smg-discovery
subjects:
  - kind: ServiceAccount
    name: smg
    namespace: inference
roleRef:
  kind: ClusterRole
  name: smg-discovery
  apiGroup: rbac.authorization.k8s.io

Complete Deployment Example¶

SMG Deployment¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: smg
  namespace: inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: smg
  template:
    metadata:
      labels:
        app: smg
    spec:
      serviceAccountName: smg
      containers:
        - name: smg
          image: ghcr.io/lightseekorg/smg:latest
          args:
            - --service-discovery
            - --selector=app=sglang-worker
            - --service-discovery-namespace=inference
            - --service-discovery-port=8000
            - --policy=cache_aware
          ports:
            - containerPort: 8000
              name: http

Engine images

For all-in-one deployments where each pod runs both gateway and engine, use an engine image tag (e.g., ghcr.io/lightseekorg/smg:{smg_version}-{engine}-{engine_version}). See Getting Started for available tags.

Worker StatefulSet¶

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sglang-worker
  namespace: inference
spec:
  serviceName: sglang-worker
  replicas: 3
  selector:
    matchLabels:
      app: sglang-worker
  template:
    metadata:
      labels:
        app: sglang-worker
    spec:
      containers:
        - name: sglang
          image: lmsysorg/sglang:latest
          args:
            - --model-path=meta-llama/Llama-3.1-8B-Instruct
            - --port=8000
          ports:
            - containerPort: 8000

Worker Lifecycle¶

Registration Flow¶

Pod Created: Kubernetes creates a new worker pod
Watch Event: SMG receives the pod creation event
Capability Query: SMG queries the worker's /model_info endpoint (falling back to the deprecated /get_model_info if the new path returns 404)
Registration: Worker is added to the registry
Health Check: Background health checks begin

Removal Flow¶

Pod Terminating: Kubernetes begins pod termination
Watch Event: SMG receives the pod deletion event
Drain: SMG stops sending new requests to the worker
Removal: Worker is removed from the registry

Worker States¶

State	Description	Receives Traffic
Pending	Just registered, not yet proven healthy locally	No
Ready	Locally verified and passing health checks	Yes
NotReady	Previously `Ready`, now failing readiness checks; not removed unless configured	No
Failed	Sustained liveness failure; removed when `--remove-unhealthy-workers` is set	No

Monitoring¶

Metrics¶

Metric	Description
`smg_discovery_workers_discovered`	Workers known via discovery
`smg_discovery_registrations_total`	Worker registration events
`smg_discovery_deregistrations_total`	Worker deregistration events
`smg_discovery_sync_duration_seconds`	Duration of each periodic reconciliation cycle

Logs¶

# Enable discovery debug logging
RUST_LOG=smg::discovery=debug smg --service-discovery ...

Example log output:

[INFO] Watching pods in namespace 'inference' with selector 'app=sglang-worker'
[INFO] Discovered new pod: sglang-worker-0 (10.0.0.5:8000)
[INFO] Registered worker: http://10.0.0.5:8000
[INFO] Discovered new pod: sglang-worker-1 (10.0.0.6:8000)
[INFO] Registered worker: http://10.0.0.6:8000

Troubleshooting¶

Symptom	Cause	Solution
No workers discovered	Wrong selector	Verify labels match selector
RBAC error	Missing permissions	Apply Role and RoleBinding
Workers not ready	Health check failing	Check worker health endpoint
Stale workers	Watch disconnected	Check Kubernetes API connectivity

Verify Discovery¶

# Check discovered workers via admin API
curl http://smg:30000/workers | jq

# Check pod labels match selector
kubectl get pods -n inference -l app=sglang-worker

# Verify RBAC
kubectl auth can-i watch pods -n inference --as=system:serviceaccount:inference:smg

What's Next?¶

PD Disaggregation¶

Learn about prefill-decode separation.

PD Disaggregation →

Load Balancing¶

Configure routing policies for discovered workers.

Load Balancing →

Health Checks¶

Configure health monitoring for workers.

Health Checks →