Real-time Multi-tenant WebSocket Gateway

A production-grade, horizontally scalable WebSocket gateway built with Node.js, TypeScript, ws, and Redis Pub/Sub — designed to serve multiple independent tenant applications over a single shared infrastructure.

🎥 Quick Demo

See the gateway in action. This shows the interactive demo UI where messages from Client A are instantly fanned out to Client B via Redis Pub/Sub:

(Note: If the video above does not play on GitHub, you can download/view it here)

What This Is

Most WebSocket tutorials show you a single-process chat server. This project goes further:

Multi-tenancy — One gateway binary serves many independent client applications (tenant-alpha, tenant-beta, …). Each tenant is isolated by its appId; messages never bleed across tenants.
Pre-handshake auth — Credentials are validated at the HTTP upgrade event, before the WebSocket handshake completes. Unauthorised clients are rejected with a 401 and their TCP socket is destroyed — no partial WebSocket state is ever created.
Horizontal scalability — Multiple gateway instances can run behind a load balancer. A message sent to one instance is automatically broadcast to clients on every other instance via Redis Pub/Sub.
Zombie-connection prevention — A server-driven heartbeat terminates unresponsive sockets immediately, keeping RSS stable under high churn.

Architecture Overview

                        ┌────────────────────────────────────────┐
                        │           Load Balancer / Proxy         │
                        │    (nginx, AWS ALB, Cloudflare, …)      │
                        └───────────┬────────────────┬────────────┘
                                    │                │
                   ws://host:3001   │                │   ws://host:3002
                                    ▼                ▼
                        ┌───────────────┐  ┌───────────────┐
                        │  Gateway      │  │  Gateway      │    … N instances
                        │  Instance 1   │  │  Instance 2   │
                        │               │  │               │
                        │  clients Map  │  │  clients Map  │
                        │  rooms Map    │  │  rooms Map    │
                        └──────┬────────┘  └───────┬───────┘
                               │  PUBLISH          │  PUBLISH
                               │                   │
                        ┌──────▼───────────────────▼───────┐
                        │                                   │
                        │           Redis 7                 │
                        │   Channels:                       │
                        │   ws:msg:{appId}:{roomId}         │
                        │                                   │
                        └────────────────────────────────── ┘
                               │  SUBSCRIBE        │  SUBSCRIBE
                               │                   │
                        ┌──────▼────────┐  ┌───────▼───────┐
                        │  Instance 1   │  │  Instance 2   │
                        │  subscriber   │  │  subscriber   │
                        └───────────────┘  └───────────────┘

Request Lifecycle (Happy Path)

Client                    Gateway Instance 1                    Redis                    Instance 2
  │                               │                               │                          │
  │── ws://…?appId=X&apiKey=Y ───▶│                               │                          │
  │                               │  validateTenant()             │                          │
  │                               │  ✓ auth OK                    │                          │
  │                               │  handleUpgrade → WS handshake │                          │
  │◀─────── 101 Switching ────────│                               │                          │
  │                               │  joinRoom()                   │                          │
  │                               │──── SUBSCRIBE ws:msg:X:room ─▶│                          │
  │                               │                               │◀── SUBSCRIBE ws:msg:X:room│
  │                               │                               │                          │
  │── { type:"message", … } ─────▶│                               │                          │
  │                               │──── PUBLISH ws:msg:X:room ───▶│                          │
  │                               │                               │──── dispatch to local ───▶│
  │                               │◀─── dispatch to local ────────│                    clients│
  │◀────── message echoed ────────│  (sender excluded)            │                          │

Why Redis Pub/Sub for Horizontal Scaling

WebSocket connections are stateful and long-lived. Once a client upgrades, its TCP connection is pinned to one server process. If you run two gateway instances behind a load balancer, a message sent by a client on Instance 1 is initially known only to Instance 1's in-process memory.

Without a coordination layer, clients on Instance 2 would never see that message. The naive fix — sticky sessions — partially alleviates the problem but doesn't help when a user has multiple tabs open on different instances.

Why Pub/Sub specifically (vs. other options)?

Approach	Trade-offs
Redis Pub/Sub ✓	Fire-and-forget, zero persistence overhead, sub-millisecond latency, built-in fan-out. Perfect for ephemeral real-time messages.
Redis Streams	Persistent, replayable, ordered. Use when clients need message history or exactly-once delivery.
Kafka / RabbitMQ	Best for durable event pipelines, but operationally heavy for a pure WS gateway.
gRPC mesh	Instances talk directly to each other; no central broker needed. Increases latency and requires service discovery.

How it works in this codebase

Each gateway instance creates two Redis connections:

publisher  ──▶  PUBLISH ws:msg:{appId}:{roomId}  ──▶  Redis
subscriber ◀──  message callback                 ◀──  Redis

Redis mandates separate connections for pub and sub because once a client enters subscribe mode it can only issue SUBSCRIBE / UNSUBSCRIBE — it cannot run PUBLISH or any other command.

When a message arrives from a client:

The message is wrapped in a RedisEnvelope tagged with the originating instanceId and connectionId.
It is published to ws:msg:{appId}:{roomId}.
Every instance — including the publisher — receives it via its subscriber.
Each instance iterates its local rooms map and calls safeSend() for every locally connected socket in that room, skipping the original sender (detected via the originInstanceId + originConnectionId pair in the envelope).

This design requires zero coordination at connection time — instances are completely independent and do not need to know about each other.

How the Heartbeat Prevents Memory Leaks

WebSocket connections can die silently. A client might:

Lose network access without sending a TCP FIN/RST.
Have its device sleep or battery die.
Be behind a NAT/firewall that silently drops idle connections.

In all these cases the server-side socket remains in ESTABLISHED state indefinitely. Left unchecked, these "zombie" connections accumulate in the clients Map, consuming memory and file descriptors until the process crashes or is restarted.

The solution: server-driven heartbeat

t=0s   server sends WebSocket PING frame  ──────────────────────▶  client
t=0s   server sets client.isAlive = false
                                          ◀──────────────────────  client replies with PONG (automatic)
t=ε    server receives PONG
t=ε    server sets client.isAlive = true

t=30s  server sends next PING ──────────────────────────────────▶  client
t=30s  server sets client.isAlive = false
       … if PONG never arrives before next tick …
t=60s  client.isAlive is still false
       server calls socket.terminate()  ← IMMEDIATE TCP teardown, no handshake
       'close' event fires → clients.delete(connectionId) → leaveRoom()
       Memory is reclaimed ✓

socket.terminate() is used deliberately — not socket.close(). The close handshake sends a CLOSE frame and waits for acknowledgement, which will never come from a zombie. terminate() calls the underlying socket.destroy() immediately, reclaiming the file descriptor without waiting.

Project Structure

.
├── src/
│   ├── server.ts          # Entry point — HTTP server, bootstrap, graceful shutdown
│   ├── websocket.ts       # WebSocket server, upgrade handler, heartbeat, room logic
│   ├── redis.ts           # Publisher + subscriber clients, Pub/Sub helpers
│   ├── auth.ts            # Pre-handshake tenant validation middleware
│   ├── config.ts          # Typed, validated environment configuration
│   ├── logger.ts          # Structured JSON logger (NDJSON output)
│   └── types.ts           # Shared TypeScript interfaces
│
├── docker-compose.yml     # Redis 7 with AOF persistence
├── package.json
├── tsconfig.json
├── .env.example           # Documented environment variable template
└── README.md

Configuration Reference

Copy .env.example to .env and adjust as needed:

Variable	Default	Description
`PORT`	`3000`	TCP port the gateway listens on
`INSTANCE_ID`	auto-generated	Unique name for this process; used in Redis envelopes
`REDIS_HOST`	`localhost`	Redis server hostname
`REDIS_PORT`	`6379`	Redis server port
`REDIS_PASSWORD`	(empty)	Redis AUTH password (leave blank for no auth)
`HEARTBEAT_INTERVAL_MS`	`30000`	Ping interval in milliseconds

Local Setup

Prerequisites

Node.js ≥ 18
Docker & Docker Compose

Steps

# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/ws-gateway.git
cd ws-gateway

# 2. Install dependencies
npm install

# 3. Copy and review environment config
cp .env.example .env

# 4. Start Redis
docker compose up -d

# 5. Start the gateway in development mode (hot-reload via tsx)
npm run dev

You should see structured JSON logs like:

{"timestamp":"2024-01-15T10:23:41.123Z","level":"info","message":"Redis connections established","host":"localhost","port":6379}
{"timestamp":"2024-01-15T10:23:41.145Z","level":"info","message":"WS Gateway is ready","port":3000,"instanceId":"instance-a1b2c3d4"}

Build for production

npm run build   # Compiles src/ → dist/
npm start       # Runs dist/server.js

Connecting a Client

WebSocket connection URL format:

ws://localhost:3000?appId=<tenant>&apiKey=<key>&roomId=<room>&userId=<user>

Mock tenants (pre-configured in `src/auth.ts`)

`appId`	`apiKey`
`tenant-alpha`	`key-alpha-secret-001`
`tenant-beta`	`key-beta-secret-002`
`tenant-gamma`	`key-gamma-secret-003`

Quick test with `websocat`

# Install: https://github.com/vi/websocat
websocat "ws://localhost:3000?appId=tenant-alpha&apiKey=key-alpha-secret-001&roomId=lobby&userId=alice"

On connection you will receive a welcome system message:

{
  "type": "system",
  "payload": {
    "message": "Connected to WS Gateway",
    "connectionId": "550e8400-e29b-41d4-a716-446655440000",
    "roomId": "lobby",
    "instanceId": "instance-a1b2c3d4"
  },
  "timestamp": "2024-01-15T10:23:45.000Z"
}

Quick test with `wscat`

npm install -g wscat
wscat -c "ws://localhost:3000?appId=tenant-alpha&apiKey=key-alpha-secret-001&roomId=lobby&userId=bob"

Wire Protocol

All messages are UTF-8 encoded JSON.

Client → Server

// Send a message to your current room
{ "type": "message", "payload": { "text": "hello room!" } }

// Broadcast (semantically identical in this implementation — alias for clarity)
{ "type": "broadcast", "payload": { "text": "hey everyone!" } }

// Application-level ping (for clients that cannot send WS control frames)
{ "type": "ping" }

Server → Client

// Room message from another user
{
  "type": "message",
  "payload": { "text": "hello room!" },
  "from": "alice",
  "roomId": "lobby",
  "timestamp": "2024-01-15T10:23:50.000Z"
}

// System notification (connection events, etc.)
{
  "type": "system",
  "payload": { "message": "Connected to WS Gateway", "connectionId": "…" },
  "timestamp": "…"
}

// Error
{
  "type": "error",
  "payload": { "message": "Invalid JSON — all messages must be valid JSON objects." },
  "timestamp": "…"
}

// Pong (reply to application-level ping)
{ "type": "pong", "payload": {}, "timestamp": "…" }

Rejected connection (HTTP 401)

HTTP/1.1 401 Unauthorized
Content-Type: text/plain
Connection: close

Unauthorized

Testing the Scaling Behaviour

This is the most interesting part to demo to a recruiter.

Step 1 — Spin up two gateway instances

In two separate terminals:

# Terminal 1
PORT=3001 INSTANCE_ID=instance-1 npm run dev

# Terminal 2
PORT=3002 INSTANCE_ID=instance-2 npm run dev

Both instances share the same Redis (already running via Docker Compose).

Step 2 — Connect two clients to different instances, same room

# Client A  →  Instance 1
wscat -c "ws://localhost:3001?appId=tenant-alpha&apiKey=key-alpha-secret-001&roomId=lobby&userId=alice"

# Client B  →  Instance 2 (different port = different process)
wscat -c "ws://localhost:3002?appId=tenant-alpha&apiKey=key-alpha-secret-001&roomId=lobby&userId=bob"

Step 3 — Send a message from Client A

In Client A's terminal, type and send:

{ "type": "message", "payload": { "text": "Cross-instance message!" } }

Client B receives it instantly, even though it is connected to a completely different Node.js process. This is the Redis Pub/Sub fanout in action.

Step 4 — Verify tenant isolation

Connect a client to tenant-beta in the same room name:

wscat -c "ws://localhost:3001?appId=tenant-beta&apiKey=key-beta-secret-002&roomId=lobby&userId=carol"

Messages sent by Alice are not received by Carol — the Redis channel name encodes the appId, keeping tenants fully isolated.

Health Endpoint

curl http://localhost:3000/health

{
  "status": "ok",
  "instanceId": "instance-a1b2c3d4",
  "totalConnections": 3,
  "totalRooms": 2,
  "rooms": {
    "tenant-alpha:lobby": 2,
    "tenant-beta:general": 1
  },
  "uptimeSeconds": 142
}

Use this with your load balancer's health probe to automatically drain an instance before rolling updates.

Design Decisions & Trade-offs

Raw `ws` over Socket.io

Socket.io is a higher-level abstraction that adds its own protocol on top of WebSocket (and falls back to HTTP long-polling). Using raw ws forces an explicit understanding of the WebSocket RFC — framing, control frames (ping/pong/close), and the upgrade handshake. It also eliminates ~200 KB of transitive dependencies and makes the gateway agnostic to client language/framework.

`noServer: true` on `WebSocketServer`

By passing noServer: true, the WebSocketServer does not bind its own HTTP server or attach its own upgrade listener. This gives us full control over the upgrade event — we can reject connections with a proper HTTP response before any WebSocket state is allocated, rather than rejecting after the handshake.

Two Redis connections per instance

ioredis follows the Redis protocol specification: a connection in subscribe mode is dedicated and cannot issue PUBLISH. Two connections are therefore required per instance, regardless of how many rooms or tenants are active. The subscription count scales with distinct rooms active on this instance, not with the number of connected clients.

In-memory state (not Redis for connection tracking)

Connection metadata (clients Map, rooms Map) is kept in process memory, not in Redis. This is intentional:

Read path is O(1) — a direct Map lookup.
Redis is not a bottleneck — it is only hit on message publish/receive, not on every state read.
The trade-off is that a process crash loses its slice of state. Clients reconnect automatically (expected behaviour for a gateway), and because pub/sub subscriptions are re-established on startup, no message is permanently lost for active connections.

`socket.terminate()` over `socket.close()`

close() initiates the WebSocket closing handshake and waits for a CLOSE frame reply. For a zombie connection (device asleep, NAT timeout, etc.) this reply will never come, so the socket lingers. terminate() calls net.Socket.destroy() immediately, bypassing the handshake, which is the correct behaviour when we have already decided the connection is dead.

Roadmap / What I Would Add Next

TLS / wss:// — Pass the HTTPS Server instance to initWebSocketServer and terminate TLS at the gateway (or upstream at the load balancer).
Redis connection tracking — Store HSET ws:connections:{appId}:{roomId} {connectionId} {instanceId} so the health endpoint can report global (cross-instance) room populations.
Rate limiting — Track message counts per connectionId in a sliding window and disconnect clients that exceed a threshold.
JWT auth — Replace the hardcoded apiKey check with jsonwebtoken verification to support short-lived, revocable credentials.
Metrics — Expose a /metrics endpoint in Prometheus text format (connections, messages/sec, Redis round-trip latency) for Grafana dashboards.
Kubernetes manifests — Deployment + HorizontalPodAutoscaler + PodDisruptionBudget for zero-downtime rolling updates.
Integration tests — Spin up two gateway instances and a real Redis with testcontainers-node and assert cross-instance delivery in CI.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
tsconfig.json		tsconfig.json
video.mp4		video.mp4

Folders and files

Latest commit

History

Repository files navigation

Real-time Multi-tenant WebSocket Gateway

🎥 Quick Demo

Table of Contents

What This Is

Architecture Overview

Request Lifecycle (Happy Path)

Why Redis Pub/Sub for Horizontal Scaling

Why Pub/Sub specifically (vs. other options)?

How it works in this codebase

How the Heartbeat Prevents Memory Leaks

The solution: server-driven heartbeat

Project Structure

Configuration Reference

Local Setup

Prerequisites

Steps

Build for production

Connecting a Client

Mock tenants (pre-configured in src/auth.ts)

Quick test with websocat

Quick test with wscat

Wire Protocol

Client → Server

Server → Client

Rejected connection (HTTP 401)

Testing the Scaling Behaviour

Step 1 — Spin up two gateway instances

Step 2 — Connect two clients to different instances, same room

Step 3 — Send a message from Client A

Step 4 — Verify tenant isolation

Health Endpoint

Design Decisions & Trade-offs

Raw ws over Socket.io

noServer: true on WebSocketServer

Two Redis connections per instance

In-memory state (not Redis for connection tracking)

socket.terminate() over socket.close()

Roadmap / What I Would Add Next

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Mock tenants (pre-configured in `src/auth.ts`)

Quick test with `websocat`

Quick test with `wscat`

Raw `ws` over Socket.io

`noServer: true` on `WebSocketServer`

`socket.terminate()` over `socket.close()`

Packages