High Availability¶
SMG supports high-availability cluster deployments using mesh networking for fault tolerance, scalability, and zero-downtime updates.
Overview¶
Fault Tolerance¶
Continue serving requests when individual router nodes fail. Automatic failover with zero manual intervention.
Scalability¶
Distribute load across multiple router instances. Add nodes without downtime.
State Synchronization¶
Share worker states, policy configurations, and rate limits across the cluster in real-time.
Zero Downtime Updates¶
Perform rolling updates without service interruption. Graceful shutdown with request draining.
Mesh Architecture¶
Gossip Protocol¶
SWIM-based protocol for membership and failure detection.
- 1-second heartbeat interval
- Automatic peer discovery
- Failure detection in seconds
Cluster Coordination¶
Node coordination for cluster operations.
- Membership tracking
- Node status management
- Graceful shutdown coordination
CRDT Stores¶
Conflict-free Replicated Data Types for eventual consistency.
- No coordination locks
- Partition tolerant
- Automatic conflict resolution
State Replication¶
Real-time synchronization of all cluster state.
- Worker registry
- Rate limit counters
- Cache-aware routing trees
Configuration¶
Command Line Options¶
| Flag | Default | Description |
|---|---|---|
--enable-mesh | false | Enable mesh networking for HA deployments |
--mesh-server-name | (auto) | Unique identifier for this node in the cluster |
--mesh-host | 0.0.0.0 | Host address for mesh communication |
--mesh-port | 39527 | Port for mesh gRPC communication |
--mesh-peer-urls | (none) | Initial peer URLs for cluster bootstrap |
Basic Configuration¶
Node 1 (Bootstrap)
Node 2 (Join)
Environment Variables¶
export SMG_ENABLE_MESH=true
export SMG_MESH_SERVER_NAME=node1
export SMG_MESH_HOST=0.0.0.0
export SMG_MESH_PORT=39527
export SMG_MESH_PEER_URLS="node1:39527,node2:39527"
Gossip Protocol¶
State Synchronization¶
SMG uses a SWIM-based gossip protocol for cluster membership and state propagation:
- Ping/Ping-Req: Each node periodically pings random peers to check health
- State Sync: Healthy nodes exchange state information during pings
- Failure Detection: Unreachable nodes are marked as suspected, then down
- Broadcast: Status changes are broadcast to all cluster members
Node Status States¶
| Status | Description |
|---|---|
INIT | Node is starting up |
ALIVE | Node is healthy and reachable |
SUSPECTED | Node may be unreachable (failed ping) |
DOWN | Node confirmed unreachable (failed ping-req) |
LEAVING | Node is gracefully shutting down |
Failure Detection Timing¶
| Phase | Duration | Action |
|---|---|---|
| Ping | 1s interval | Direct probe to peer |
| Down | After missed pings | Remove from active cluster |
State Synchronization¶
Synchronized State Types¶
Worker Registry¶
All nodes share worker discovery and health status.
- Worker URLs and metadata
- Health check results
- Circuit breaker states
Rate Limits¶
Cluster-wide rate limiting coordination.
- Token bucket state
- Request counters
- Quota synchronization
Routing Trees¶
Cache-aware routing state shared across nodes.
- Radix tree operations
- Prefix match data
- LRU eviction coordination
Policy State¶
Routing policy configuration and state.
- Policy parameters
- Load balancing weights
- Session affinity mappings
CRDT Implementation¶
SMG uses several CRDT types for conflict-free synchronization:
| CRDT Type | Used For | Merge Strategy |
|---|---|---|
| G-Counter | Request counts | Sum of all increments |
| PN-Counter | Token buckets | Sum of positive and negative |
| LWW-Register | Worker state | Last-writer-wins by timestamp |
| OR-Set | Worker sets | Union with tombstones |
Deployment Patterns¶
Three-Node Cluster (Minimum HA)¶
Characteristics
- Tolerates 1 node failure
- Quorum of 2 for leader election
- Recommended for most deployments
Five-Node Cluster (Higher Availability)¶
Characteristics
- Tolerates 2 node failures
- Quorum of 3 for leader election
- Suitable for critical workloads
Kubernetes Deployment¶
StatefulSet Configuration¶
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: smg
spec:
serviceName: smg-mesh
replicas: 3
selector:
matchLabels:
app: smg
template:
metadata:
labels:
app: smg
spec:
containers:
- name: smg
image: ghcr.io/lightseekorg/smg:latest
args:
- --enable-mesh
- --mesh-server-name=$(POD_NAME)
- --mesh-host=0.0.0.0
- --mesh-port=39527
- --mesh-peer-urls=smg-0.smg-mesh:39527,smg-1.smg-mesh:39527,smg-2.smg-mesh:39527
- --worker-urls=$(WORKER_URLS)
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 8000
name: http
- containerPort: 39527
name: mesh
Headless Service¶
apiVersion: v1
kind: Service
metadata:
name: smg-mesh
spec:
clusterIP: None
selector:
app: smg
ports:
- port: 39527
name: mesh
HA Management API¶
Health Endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/ha/health | GET | Node health status |
/ha/status | GET | Cluster status information |
/ha/workers | GET | Worker states across cluster |
/ha/policies | GET | Policy states across cluster |
/ha/shutdown | POST | Graceful shutdown trigger |
Cluster Status Response¶
{
"node_name": "node1",
"node_count": 3,
"nodes": [
{"name": "node1", "status": "ALIVE", "address": "node1:39527"},
{"name": "node2", "status": "ALIVE", "address": "node2:39527"},
{"name": "node3", "status": "ALIVE", "address": "node3:39527"}
],
"stores": {
"workers": {"entry_count": 5, "last_sync": "2024-01-15T10:30:00Z"},
"policies": {"entry_count": 2, "last_sync": "2024-01-15T10:30:00Z"}
}
}
Monitoring¶
Mesh Metrics¶
| Metric | Description |
|---|---|
smg_mesh_peers_total | Number of connected peers |
smg_mesh_peer_status | Status of each peer (1=alive, 0=down) |
smg_mesh_sync_operations_total | State sync operations by type |
smg_mesh_sync_latency_seconds | State sync latency histogram |
smg_mesh_leader_elections_total | Leader election events |
smg_mesh_gossip_messages_total | Gossip messages sent/received |
Alerting Rules¶
groups:
- name: smg-mesh
rules:
- alert: SMGClusterDegraded
expr: smg_mesh_peers_total < 2
for: 1m
labels:
severity: warning
annotations:
summary: "SMG cluster has fewer than 3 nodes"
- alert: SMGNodeDown
expr: smg_mesh_peer_status == 0
for: 30s
labels:
severity: critical
annotations:
summary: "SMG mesh node {{ $labels.peer }} is down"
Best Practices¶
Odd Node Counts¶
Use 3, 5, or 7 nodes to avoid split-brain scenarios during network partitions.
Availability Zones¶
Distribute nodes across availability zones for resilience against zone failures.
Network Latency¶
Keep mesh nodes in the same region (< 10ms RTT) for optimal state sync performance.
Monitoring¶
Monitor smg_mesh_peers_total and alert when cluster size drops below threshold.
Troubleshooting¶
Common Issues¶
| Symptom | Cause | Solution |
|---|---|---|
| Node stuck in INIT | Cannot reach peers | Check firewall rules for mesh port |
| Frequent leader elections | Network instability | Increase gossip timeouts |
| State inconsistency | Clock skew | Synchronize NTP across nodes |
| High sync latency | Large state | Increase sync interval |
Debug Logging¶
Verify Cluster Health¶
# Check cluster status
curl http://node1:8000/ha/status | jq
# Check individual node health
curl http://node1:8000/ha/health | jq
# Check worker states
curl http://node1:8000/ha/workers | jq
# Check policy states
curl http://node1:8000/ha/policies | jq