Skip to content

Latest commit

 

History

History
76 lines (60 loc) · 2.73 KB

File metadata and controls

76 lines (60 loc) · 2.73 KB

Git Commit Message (Draft)

Title

fix(cluster): Fix critical bugs in cluster cross-node message routing

Description

Fixed 2 critical bugs that prevented cluster cross-node message routing from working:

Bug #1: Node address using hostname instead of localhost

File: cmd/emqx-go/main.go Issue: Nodes advertised their address using nodeID as hostname (e.g., "node2:8083"), which cannot be resolved in local multi-process deployment. Fix: Use "localhost" as hostname for local deployment (e.g., "localhost:8083").

Bug #2: Join RPC handler using request context

File: pkg/cluster/server.go Issue: Join RPC handler used the request context which gets canceled when RPC returns, causing reverse peer connections to fail immediately. Fix: Use context.Background() instead to allow peer connections to persist beyond RPC lifecycle.

Impact

These bugs completely prevented cross-node message routing:

  • Node2 could join Node1's cluster
  • Node2's subscriptions were synced to Node1
  • But when Node1 tried to forward messages to Node2, it failed with "no peer client for node"

After fixes, full bidirectional cluster communication works correctly.

Testing

Added comprehensive E2E testing infrastructure:

  • Created 3-node cluster deployment scripts (local and Docker)
  • Developed cluster testing tool in Go
  • Verified cross-node message routing works correctly

Test result: ✅ SUCCESS - Messages successfully route from Node1 to Node2

Files Changed

Modified:

  • cmd/emqx-go/main.go - Fix node address, add environment variable support
  • pkg/cluster/server.go - Fix Join handler context usage
  • README.md - Add cluster testing documentation

Added:

  • cmd/cluster-test/main.go - Enhanced cluster testing tool with improved error handling
  • cmd/cluster-test/README.md - Comprehensive documentation for test tool
  • scripts/start-cluster.sh - Start 3-node local cluster
  • scripts/stop-cluster.sh - Stop cluster
  • docker-compose-cluster.yaml - Docker cluster configuration
  • CLUSTER_E2E_TEST_REPORT.md - Detailed test report
  • CLUSTER_TESTING_GUIDE.md - Testing guide
  • CLUSTER_TEST_SUMMARY.md - Quick summary

Improvements to Test Tool:

  • Added command-line flags for all test parameters
  • Improved error handling with detailed error messages
  • Added connection timeout detection (5s)
  • Added verbose logging mode for debugging
  • Measure and display message delivery latency
  • Provide troubleshooting tips on test failure
  • Support for testing remote clusters
  • Return proper exit codes (0=success, 1=failure)

How to Test

# Start 3-node cluster
./scripts/start-cluster.sh

# Run test
go run ./cmd/cluster-test/main.go

# Stop cluster
./scripts/stop-cluster.sh

Expected output: "✓✓✓ SUCCESS! Cross-node messaging works!"