$ ls ./menu

© 2025 ESSA MAMDANI

cd ../blog
7 min read
Backend & APIs

WebSocket Scaling: From Hundreds to Millions — A Developer's Guide

Audio version coming soon
WebSocket Scaling: From Hundreds to Millions — A Developer's Guide
Verified by Essa Mamdani

WebSocket Scaling: From Hundreds to Millions — A Developer's Guide

WebSockets have revolutionized real-time web communication, enabling everything from live chat applications to financial trading platforms. But as your user base grows from hundreds to thousands, and eventually millions, the challenges of scaling WebSocket infrastructure become increasingly complex. This comprehensive guide explores proven strategies for scaling WebSocket connections at every stage of growth.

Understanding WebSocket Architecture

Unlike traditional HTTP requests that follow a request-response pattern, WebSockets establish a persistent, bidirectional connection between client and server. This persistent nature is both their greatest strength and their primary scaling challenge.

The Connection Problem

Every WebSocket connection consumes server resources:

  • Memory: Each connection requires memory for buffers, session state, and connection metadata
  • File Descriptors: Operating systems limit the number of open connections per process
  • CPU: Handling message routing, heartbeat pings, and protocol overhead

A single server can typically handle 10,000–50,000 concurrent WebSocket connections, depending on available resources and message patterns. Beyond that, you need a scaling strategy.

Phase 1: Single Server (0–10K Users)

At this stage, a well-optimized single server can handle your load. Focus on:

Server Selection

  • Node.js: Excellent ecosystem with Socket.IO, but single-threaded nature limits raw connection count
  • Go: Goroutines make concurrent connection handling efficient; can manage 100K+ connections per instance
  • Rust: Maximum performance with frameworks like Actix or Tokio; ideal for high-frequency messaging
  • Elixir/Erlang: The BEAM VM was built for massive concurrency; WhatsApp famously handles millions per node

Optimization Techniques

javascript
1// Node.js example: Optimizing Socket.IO
2const io = require('socket.io')(server, {
3  transports: ['websocket'], // Skip polling fallback for better performance
4  perMessageDeflate: false,    // Disable compression for lower latency
5  maxHttpBufferSize: 1e6       // Limit message size to prevent abuse
6});
7
8// Use Redis adapter when ready to scale horizontally
9const { createAdapter } = require('@socket.io/redis-adapter');
10io.adapter(createAdapter(pubClient, subClient));

Connection Limits

Monitor these system limits:

bash
1# Check current limits
2ulimit -n  # File descriptors (default often 1024)
3
4# Increase for production
5ulimit -n 65535
6
7# Persist in /etc/security/limits.conf
8* soft nofile 65535
9* hard nofile 65535

Phase 2: Horizontal Scaling (10K–100K Users)

When a single server reaches capacity, you need multiple instances. This introduces the load balancing challenge.

Sticky Sessions vs. Stateless Architecture

Sticky Sessions (Session Affinity)

  • Routes users to the same server throughout their session
  • Simple to implement but creates uneven load distribution
  • Problematic when servers fail or need maintenance
nginx
1# Nginx sticky sessions example
2upstream websocket_backend {
3    ip_hash;  # Route by client IP
4    server ws1.example.com:3000;
5    server ws2.example.com:3000;
6    server ws3.example.com:3000;
7}

Stateless Architecture (Recommended)

  • Any server can handle any user's messages
  • Requires shared state storage (Redis, PostgreSQL)
  • Enables true elastic scaling

The Redis Pub/Sub Pattern

Redis becomes critical for horizontal scaling:

javascript
1// Server A receives a message, broadcasts to all servers
2io.on('connection', (socket) => {
3  socket.on('chat-message', async (data) => {
4    // 1. Persist to database
5    await db.messages.create(data);
6    
7    // 2. Publish to Redis channel
8    await redis.publish('chat-messages', JSON.stringify({
9      room: data.roomId,
10      message: data
11    }));
12  });
13});
14
15// All servers subscribe to Redis
16redis.subscribe('chat-messages');
17redis.on('message', (channel, message) => {
18  const { room, message: msg } = JSON.parse(message);
19  // Broadcast to connected clients in this room
20  io.to(room).emit('new-message', msg);
21});

Load Balancer Configuration

Modern load balancers support WebSocket proxying:

nginx
1# Nginx WebSocket configuration
2location /socket.io/ {
3    proxy_pass http://websocket_backend;
4    proxy_http_version 1.1;
5    proxy_set_header Upgrade $http_upgrade;
6    proxy_set_header Connection "upgrade";
7    proxy_set_header Host $host;
8    proxy_set_header X-Real-IP $remote_addr;
9    
10    # Timeouts for long-lived connections
11    proxy_read_timeout 86400;
12    proxy_send_timeout 86400;
13}

Phase 3: Geographic Distribution (100K–1M Users)

As your user base spreads globally, latency becomes a critical concern. A user in Tokyo shouldn't connect to a server in Virginia.

Multi-Region Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  US-East    │◄───►│   Redis     │◄───►│  EU-West    │
│  (Virginia) │     │   Cluster   │     │  (Ireland)  │
└──────┬──────┘     └─────────────┘     └──────┬──────┘
       │                                        │
       ▼                                        ▼
┌─────────────┐                         ┌─────────────┐
│  US-West    │◄───────────────────────►│  APAC       │
│  (Oregon)   │      Redis CRDT         │  (Singapore)│
└─────────────┘                         └─────────────┘

Geo-DNS Routing

Route users to the nearest data center:

yaml
1# AWS Route53 or Cloudflare Load Balancing
2- name: ws-us-east.example.com
3  region: us-east-1
4  health_check: enabled
5  
6- name: ws-eu-west.example.com
7  region: eu-west-1
8  health_check: enabled
9  
10- name: ws-apac.example.com
11  region: ap-southeast-1
12  health_check: enabled

Cross-Region Message Routing

When users in different regions need to communicate:

javascript
1// Cross-region message routing with Redis Cluster
2class GlobalMessageRouter {
3  constructor(redisCluster) {
4    this.redis = redisCluster;
5    this.localRegion = process.env.REGION; // 'us-east', 'eu-west', etc.
6  }
7  
8  async routeMessage(message) {
9    const targetRegion = await this.getUserRegion(message.recipientId);
10    
11    if (targetRegion === this.localRegion) {
12      // Local delivery
13      this.deliverLocally(message);
14    } else {
15      // Cross-region via Redis Streams
16      await this.redis.xadd(
17        `messages:${targetRegion}`,
18        '*', // Auto-generate ID
19        'payload', JSON.stringify(message)
20      );
21    }
22  }
23}

Phase 4: Massive Scale (1M+ Users)

At millions of concurrent connections, every optimization matters.

Connection Offloading

Edge WebSockets: Use CDN edge nodes to handle the WebSocket handshake and initial connection:

  • Cloudflare Durable Objects
  • AWS API Gateway WebSocket API
  • Fastly Fanout

These services maintain the connection while your backend processes messages:

javascript
1// AWS API Gateway WebSocket example
2exports.handler = async (event) => {
3  const { connectionId, body } = event;
4  const message = JSON.parse(body);
5  
6  // Process message
7  await processMessage(message);
8  
9  // Send response via API Gateway Management API
10  await apigateway.postToConnection({
11    ConnectionId: connectionId,
12    Data: JSON.stringify({ status: 'processed' })
13  });
14  
15  return { statusCode: 200 };
16};

Message Batching

Reduce overhead by batching messages:

javascript
1class MessageBatcher {
2  constructor(io, batchSize = 100, flushInterval = 50) {
3    this.io = io;
4    this.batch = [];
5    this.batchSize = batchSize;
6    this.flushInterval = flushInterval;
7    
8    setInterval(() => this.flush(), flushInterval);
9  }
10  
11  add(message) {
12    this.batch.push(message);
13    if (this.batch.length >= this.batchSize) {
14      this.flush();
15    }
16  }
17  
18  flush() {
19    if (this.batch.length === 0) return;
20    
21    // Send batched messages
22    this.io.emit('batch-update', this.batch);
23    this.batch = [];
24  }
25}

Selective Broadcasting

Not every user needs every message:

javascript
1// Instead of broadcasting to all
2io.emit('update', data); // ❌ Expensive at scale
3
4// Use rooms and namespaces
5io.to(`room:${data.roomId}`).emit('update', data); // ✅ Targeted
6
7// Or use Redis Streams for fan-out
8await redis.xadd('stream:room:123', '*', 'data', JSON.stringify(data));

Backpressure Handling

When message production exceeds consumption:

javascript
1const { Transform } = require('stream');
2
3class BackpressureHandler extends Transform {
4  constructor(options) {
5    super({ objectMode: true, highWaterMark: 1000 });
6    this.droppedMessages = 0;
7  }
8  
9  _transform(message, encoding, callback) {
10    if (this.writableHighWaterMark > 900) {
11      // Buffer nearly full - drop non-critical messages
12      if (message.priority !== 'high') {
13        this.droppedMessages++;
14        callback(); // Drop silently
15        return;
16      }
17    }
18    
19    this.push(message);
20    callback();
21  }
22}

Monitoring and Observability

At scale, you need comprehensive visibility:

Key Metrics

javascript
1// Prometheus metrics example
2const websocketConnections = new Gauge({
3  name: 'websocket_connections_total',
4  help: 'Total WebSocket connections',
5  labelNames: ['region', 'server_id']
6});
7
8const messageLatency = new Histogram({
9  name: 'websocket_message_latency_ms',
10  help: 'Message processing latency',
11  buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000]
12});
13
14const messageRate = new Counter({
15  name: 'websocket_messages_total',
16  help: 'Total messages processed',
17  labelNames: ['direction'] // 'inbound' or 'outbound'
18});

Health Checks

Implement proper health checks for load balancers:

javascript
1app.get('/health', async (req, res) => {
2  const checks = await Promise.all([
3    checkRedisConnection(),
4    checkDatabaseConnection(),
5    checkMemoryUsage()
6  ]);
7  
8  const healthy = checks.every(c => c.healthy);
9  
10  res.status(healthy ? 200 : 503).json({
11    status: healthy ? 'healthy' : 'unhealthy',
12    connections: io.engine.clientsCount,
13    memory: process.memoryUsage(),
14    checks
15  });
16});

Common Pitfalls and Solutions

1. Memory Leaks

Problem: Accumulating connection objects without cleanup Solution:

javascript
1io.on('connection', (socket) => {
2  const userData = { /* ... */ };
3  
4  socket.on('disconnect', () => {
5    // Always clean up
6    delete userSessions[socket.id];
7    clearInterval(socket.heartbeatInterval);
8  });
9});

2. Thundering Herd

Problem: All clients reconnect simultaneously after a server restart Solution: Implement exponential backoff with jitter:

javascript
1const reconnectWithBackoff = (attempt = 0) => {
2  const baseDelay = 1000;
3  const maxDelay = 30000;
4  const jitter = Math.random() * 1000;
5  const delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay) + jitter;
6  
7  setTimeout(() => {
8    socket.connect();
9  }, delay);
10};

3. Message Ordering

Problem: Messages arrive out of order during high load Solution: Use sequence numbers and client-side reordering:

javascript
1// Server: Add sequence numbers
2let sequence = 0;
3socket.emit('message', { seq: ++sequence, data });
4
5// Client: Reorder messages
6const buffer = new Map();
7let expectedSeq = 1;
8
9socket.on('message', ({ seq, data }) => {
10  buffer.set(seq, data);
11  
12  // Process in-order messages
13  while (buffer.has(expectedSeq)) {
14    processMessage(buffer.get(expectedSeq));
15    buffer.delete(expectedSeq);
16    expectedSeq++;
17  }
18});

Conclusion

Scaling WebSockets from hundreds to millions of connections requires evolving your architecture through distinct phases:

  1. Single Server: Optimize your application and understand resource limits
  2. Horizontal Scaling: Introduce Redis and stateless architecture
  3. Geographic Distribution: Deploy multi-region with smart routing
  4. Massive Scale: Leverage edge services and implement advanced patterns

The key is to implement each optimization only when needed. Premature optimization adds complexity without benefit, but being prepared for the next phase prevents painful rewrites when growth accelerates.

Start with solid fundamentals, monitor relentlessly, and scale incrementally. Your WebSocket infrastructure can handle millions of connections with the right architecture and practices.


What's your current WebSocket scaling challenge? Share your experiences and let's discuss solutions in the comments.