Skip to content

High Availability

SMG supports high-availability cluster deployments using mesh networking for fault tolerance, scalability, and zero-downtime updates.


Overview

Fault Tolerance

Continue serving requests when individual router nodes fail. Automatic failover with zero manual intervention.

Scalability

Distribute load across multiple router instances. Add nodes without downtime.

State Synchronization

Share worker states, policy configurations, and rate limits across the cluster in real-time.

Zero Downtime Updates

Perform rolling updates without service interruption. Graceful shutdown with request draining.


Mesh Architecture

SMG Mesh Architecture

Gossip Protocol

SWIM-based protocol for membership and failure detection.

  • 1-second heartbeat interval
  • Automatic peer discovery
  • Failure detection in seconds

Cluster Coordination

Node coordination for cluster operations.

  • Membership tracking
  • Node status management
  • Graceful shutdown coordination

CRDT Stores

Conflict-free Replicated Data Types for eventual consistency.

  • No coordination locks
  • Partition tolerant
  • Automatic conflict resolution

State Replication

Real-time synchronization of all cluster state.

  • Worker registry
  • Rate limit counters
  • Cache-aware routing trees

Configuration

Quick Start

Enable mesh networking with minimal flags:

# Start first node
smg --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.1 --mesh-port 39527

# Start second node, joining the first
smg --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.2 --mesh-port 39528 --mesh-peer-urls 10.0.0.1:39527

Command Line Options

Flag Default Description
--enable-mesh false Enable mesh networking for HA deployments
--mesh-server-name Mesh_<4 random chars> Unique identifier for this node in the cluster
--mesh-host 0.0.0.0 Bind address for mesh communication
--mesh-advertise-host --mesh-host Routable address advertised to mesh peers
--mesh-port 39527 Port for mesh gRPC communication
--mesh-peer-urls (none) Initial peer URLs for cluster bootstrap; only the first entry is used as the bootstrap peer
--router-selector (none) Label selector for Kubernetes pod discovery (e.g. app=smg tier=router)

--mesh-advertise-host is required when --mesh-host is unspecified

If --mesh-host is set to an unspecified bind address (for example 0.0.0.0), the gateway refuses to start unless --mesh-advertise-host is set to a routable node IP. This prevents other peers from trying to dial an unroutable address.

Python Entrypoint

--enable-mesh is also available in the Python entrypoint used by the Docker image. When --mesh-host is left at 0.0.0.0, set --mesh-advertise-host to a routable address such as the pod IP:

smg launch --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.11 --mesh-port 39527

Basic Configuration

Node 1 (Bootstrap)

smg --enable-mesh \
    --mesh-server-name node1 \
    --mesh-host 0.0.0.0 \
    --mesh-advertise-host 10.0.0.11 \
    --mesh-port 39527 \
    --host 0.0.0.0 \
    --port 8000

Node 2 (Join)

smg --enable-mesh \
    --mesh-server-name node2 \
    --mesh-host 0.0.0.0 \
    --mesh-advertise-host 10.0.0.12 \
    --mesh-port 39527 \
    --mesh-peer-urls "10.0.0.11:39527" \
    --host 0.0.0.0 \
    --port 8000

Node 3 (Join)

smg --enable-mesh \
    --mesh-server-name node3 \
    --mesh-host 0.0.0.0 \
    --mesh-advertise-host 10.0.0.13 \
    --mesh-port 39527 \
    --mesh-peer-urls 10.0.0.11:39527 \
    --host 0.0.0.0 \
    --port 8000

Only the first peer bootstraps membership

Later peers are learned via gossip after the initial connection. Pass exactly one reachable peer for cluster bootstrap — additional values on --mesh-peer-urls are currently ignored.


Gossip Protocol

State Synchronization

SMG uses a SWIM-based gossip protocol for cluster membership and state propagation:

  1. Ping/Ping-Req: Each node periodically pings random peers to check health
  2. State Sync: Healthy nodes exchange state information during pings
  3. Failure Detection: Unreachable nodes are marked as suspected, then down
  4. Broadcast: Status changes are broadcast to all cluster members

Node Status States

Status Description
INIT Node is starting up
ALIVE Node is healthy and reachable
SUSPECTED Node may be unreachable (failed ping)
DOWN Node confirmed unreachable (failed ping-req)
LEAVING Node is gracefully shutting down

Failure Detection Timing

Phase Duration Action
Ping 1s interval Direct probe to peer
Down After missed pings Remove from active cluster

State Synchronization

Synchronized State Types

Worker Registry

All nodes share worker discovery and health status.

  • Worker URLs and metadata
  • Health check results
  • Circuit breaker states

Rate Limits

Cluster-wide rate limiting coordination.

  • Token bucket state
  • Request counters
  • Quota synchronization

Routing Trees

Cache-aware routing state shared across nodes.

  • Radix tree operations
  • Prefix match data
  • LRU eviction coordination

Policy State

Routing policy configuration and state.

  • Policy parameters
  • Load balancing weights
  • Session affinity mappings

Cache-Aware State Sync

Cache-aware routing policy state is synchronized across mesh nodes. This ensures that KV cache routing decisions are consistent across all routers in the cluster, preventing redundant cache misses and enabling optimal prefix reuse regardless of which router handles the request.

Cluster introspection endpoints

Use the HA management API under /ha/* (listed below) to inspect cluster membership, synchronized worker state, and policy state.

CRDT Implementation

SMG uses several CRDT types for conflict-free synchronization:

CRDT Type Used For Merge Strategy
G-Counter Request counts Sum of all increments
PN-Counter Token buckets Sum of positive and negative
LWW-Register Worker state Last-writer-wins by timestamp
OR-Set Worker sets Union with tombstones

Deployment Patterns

Three-Node Cluster (Minimum HA)

Characteristics

  • Tolerates 1 node failure
  • Quorum of 2 for leader election
  • Recommended for most deployments

Configuration

# All nodes — point at one reachable peer for bootstrap
smg --enable-mesh \
    --mesh-peer-urls node1:39527 \
    --worker-urls http://worker1:8000 http://worker2:8000

Five-Node Cluster (Higher Availability)

Characteristics

  • Tolerates 2 node failures
  • Quorum of 3 for leader election
  • Suitable for critical workloads

Configuration

# All nodes — point at one reachable peer for bootstrap
smg --enable-mesh \
    --mesh-peer-urls node1:39527 \
    --worker-urls http://worker1:8000 http://worker2:8000

Kubernetes Deployment

StatefulSet Configuration

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: smg
spec:
  serviceName: smg-mesh
  replicas: 3
  selector:
    matchLabels:
      app: smg
  template:
    metadata:
      labels:
        app: smg
    spec:
      containers:
      - name: smg
        image: ghcr.io/lightseekorg/smg:latest
        args:
        - --enable-mesh
        - --mesh-server-name=$(POD_NAME)
        - --mesh-host=0.0.0.0
        - --mesh-advertise-host=$(POD_IP)
        - --mesh-port=39527
        - --mesh-peer-urls=smg-0.smg-mesh:39527
        - --worker-urls=$(WORKER_URLS)
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - containerPort: 8000
          name: http
        - containerPort: 39527
          name: mesh

Engine images

For all-in-one deployments where each pod runs both gateway and engine, use an engine image tag (e.g., ghcr.io/lightseekorg/smg:{smg_version}-{engine}-{engine_version}). See Getting Started for available tags.

Headless Service

apiVersion: v1
kind: Service
metadata:
  name: smg-mesh
spec:
  clusterIP: None
  selector:
    app: smg
  ports:
  - port: 39527
    name: mesh

Kubernetes Pod Discovery

Use --router-selector to enable automatic pod discovery via the Kubernetes API. SMG will find and join other router pods matching the given label selector, removing the need for static --mesh-peer-urls:

smg --enable-mesh --service-discovery --router-selector app=smg tier=router

Label Selectors

The --router-selector flag accepts space-separated key=value pairs that map to Kubernetes label selectors. All matching pods with an exposed mesh port are automatically added as peers.


HA Management API

Health Endpoints

Endpoint Method Description
/ha/health GET Node health status
/ha/status GET Cluster status information
/ha/workers GET Worker states across cluster
/ha/policies GET Policy states across cluster
/ha/shutdown POST Graceful shutdown trigger

Cluster Status Response

{
  "node_name": "node1",
  "node_count": 3,
  "nodes": [
    {"name": "node1", "address": "node1:39527", "status": "1", "version": 1},
    {"name": "node2", "address": "node2:39527", "status": "1", "version": 1},
    {"name": "node3", "address": "node3:39527", "status": "1", "version": 1}
  ],
  "stores": {
    "membership_count": 3,
    "worker_count": 0,
    "policy_count": 0,
    "app_count": 0
  }
}

The status field is emitted as the stringified discriminant of the NodeStatus prost enum: "0" INIT, "1" ALIVE, "2" SUSPECTED, "3" DOWN, "4" LEAVING.


Monitoring

Mesh Metrics

Metric Description
router_mesh_peer_connections Number of active peer connections
router_mesh_peer_reconnects_total Total number of peer reconnections
router_mesh_batches_total Total state update batches sent/received
router_mesh_bytes_total Total bytes transmitted in mesh
router_mesh_convergence_ms State convergence time across the mesh
router_mesh_snapshot_trigger_total Total number of snapshot triggers

Alerting Rules

groups:
- name: smg-mesh
  rules:
  # router_mesh_peer_connections is a per-peer gauge (0/1, labeled by "peer").
  # count(router_mesh_peer_connections == 1) gives the number of active peer links.
  # Adjust the threshold to match your expected peer count (e.g. N-1 for an N-node cluster).
  - alert: SMGClusterDegraded
    expr: count(router_mesh_peer_connections == 1) < 2
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "SMG cluster has fewer than expected peer connections"

  - alert: SMGNodeDown
    expr: router_mesh_peer_connections == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "SMG mesh node {{ $labels.peer }} is down"

Best Practices

Odd Node Counts

Use 3, 5, or 7 nodes to avoid split-brain scenarios during network partitions.

Availability Zones

Distribute nodes across availability zones for resilience against zone failures.

Network Latency

Keep mesh nodes in the same region (< 10ms RTT) for optimal state sync performance.

Monitoring

Monitor count(router_mesh_peer_connections == 1) to track active peer links and alert when the count drops below your expected threshold.


Troubleshooting

Common Issues

Symptom Cause Solution
Node stuck in INIT Cannot reach peers Check firewall rules for mesh port
Frequent leader elections Network instability Increase gossip timeouts
State inconsistency Clock skew Synchronize NTP across nodes
High sync latency Large state Increase sync interval

Debug Logging

RUST_LOG=smg::mesh=debug smg --enable-mesh ...

Verify Cluster Health

# Check cluster status
curl http://node1:8000/ha/status | jq

# Check individual node health
curl http://node1:8000/ha/health | jq

# Check worker states
curl http://node1:8000/ha/workers | jq

# Check policy states
curl http://node1:8000/ha/policies | jq

What's Next?

Graceful Shutdown

Allow in-flight requests to complete during shutdown.

Graceful Shutdown →

Circuit Breakers

Isolate failing workers to prevent cascade failures.

Circuit Breakers →

Metrics Reference

Complete list of mesh networking metrics.

Metrics Reference →