🐞 Bug Summary
test_health_mirrors_runtime_mode_state in tests/protocol_compliance/test_runtime_mode.py fails in Edge (Rust) mode because the convergence polling loop only checks effective_mode, but the subsequent assertion also requires override_active to match across endpoints.
In a multi-replica deployment (docker-compose runs 3 gateway replicas behind Nginx), the two HTTP requests (GET /health and GET /admin/runtime/mcp-mode) can land on different pods. When boot_mode is edge and one pod has received an override-to-edge via Redis pub/sub but the other hasn't yet:
- Both pods report
effective_mode="edge" → the convergence loop exits immediately
- But
override_active diverges: True on the pod that received the override, False on the pod that hasn't
🧩 Affected Component
🔁 Steps to Reproduce
- Run protocol compliance tests against a multi-replica Edge-mode deployment
pytest tests/protocol_compliance/test_runtime_mode.py::test_health_mirrors_runtime_mode_state
🤔 Expected Behavior
The test should wait for all four compared fields (boot_mode, effective_mode, override_active, cluster_propagation) to converge before asserting, not just effective_mode.
📓 Logs / Error Output
tests/protocol_compliance/test_runtime_mode.py:149: in test_health_mirrors_runtime_mode_state
assert mcp_rt.get(key) == admin.get(key), f"mcp_runtime.{key}={mcp_rt.get(key)!r} vs admin.{key}={admin.get(key)!r}"
E AssertionError: mcp_runtime.override_active=True vs admin.override_active=False
E assert True == False
🧠 Environment Info
| Key |
Value |
| Runtime mode |
Edge (Rust MCP runtime active) |
| Deployment |
docker-compose, multi-replica behind Nginx |
🧩 Additional Context
The root cause is in the convergence loop (lines 139-147 of test_runtime_mode.py):
if mcp_rt.get("effective_mode") == admin.get("effective_mode"):
break
This should check all four asserted keys, not just effective_mode. When boot_mode already equals the override target (both edge), effective_mode is identical on both pods regardless of whether the override has propagated, masking the override_active divergence.
This is not a Rust runtime bug — both Python endpoints compute override_active the same way. It's a test convergence gap exposed by multi-pod state propagation timing.
🐞 Bug Summary
test_health_mirrors_runtime_mode_stateintests/protocol_compliance/test_runtime_mode.pyfails in Edge (Rust) mode because the convergence polling loop only checkseffective_mode, but the subsequent assertion also requiresoverride_activeto match across endpoints.In a multi-replica deployment (docker-compose runs 3 gateway replicas behind Nginx), the two HTTP requests (
GET /healthandGET /admin/runtime/mcp-mode) can land on different pods. Whenboot_modeisedgeand one pod has received an override-to-edgevia Redis pub/sub but the other hasn't yet:effective_mode="edge"→ the convergence loop exits immediatelyoverride_activediverges:Trueon the pod that received the override,Falseon the pod that hasn't🧩 Affected Component
🔁 Steps to Reproduce
pytest tests/protocol_compliance/test_runtime_mode.py::test_health_mirrors_runtime_mode_state🤔 Expected Behavior
The test should wait for all four compared fields (
boot_mode,effective_mode,override_active,cluster_propagation) to converge before asserting, not justeffective_mode.📓 Logs / Error Output
🧠 Environment Info
🧩 Additional Context
The root cause is in the convergence loop (lines 139-147 of
test_runtime_mode.py):This should check all four asserted keys, not just
effective_mode. Whenboot_modealready equals the override target (bothedge),effective_modeis identical on both pods regardless of whether the override has propagated, masking theoverride_activedivergence.This is not a Rust runtime bug — both Python endpoints compute
override_activethe same way. It's a test convergence gap exposed by multi-pod state propagation timing.