Skip to content

test(rate-limiter): align integration + load tests with framework refactor#4635

Open
gandhipratik203 wants to merge 1 commit intomainfrom
test/rate-limiter-test-infra-refresh
Open

test(rate-limiter): align integration + load tests with framework refactor#4635
gandhipratik203 wants to merge 1 commit intomainfrom
test/rate-limiter-test-infra-refresh

Conversation

@gandhipratik203
Copy link
Copy Markdown
Collaborator

@gandhipratik203 gandhipratik203 commented May 7, 2026

Summary

The cpex framework refactor on main changed two contracts the existing rate-limiter tests didn't track. As a result the tests passed nothing meaningful through the gateway HTTP path: integration tests hit HTTP 400 on tool calls, and the Make-driven load tests either no-op'd or 401'd before generating traffic. This PR brings them back in line with the current contract.

What broke and why

Contract change on main Test failure mode
POST /servers/<id>/mcp now requires Mcp-Session-Id on every non-initialize call Tool-path tests returned HTTP 400 Missing session ID; load test's tools/list auto-detect failed silently → _tool_names stayed empty → @task call_tool early-returned every tick
GET /admin/plugins reports framework's internal mode label (sequential / transform / disabled) instead of the operator label set via PUT (enforce / permissive / disabled) One dynamic-behaviour assertion compared against the operator label
.env.example's JWT_SECRET_KEY lengthened to satisfy the 32-byte minimum Two locustfiles still defaulted to the old 11-char "my-test-key" → admin requests 401'd

Changes

  • tests/integration/test_rate_limiter_multi_tenant.py — add _mcp_initialize_session helper; _invoke_tool_once now does the initialize + initialized handshake and sends Mcp-Session-Id on the tool call.
  • tests/integration/test_rate_limiter_dynamic_behavior.py — same handshake helper added to _send_tool_burst; admin-API mode assertion relaxed to "reported mode is non-disabled" with a comment explaining the operator-vs-internal label mapping.
  • tests/loadtest/locustfile_rate_limiter_backend_correctness.py_auto_detect runs the MCP initialize handshake before tools/list so _tool_names actually populates.
  • tests/loadtest/locustfile_rate_limiter_redis_capacity.py + locustfile_rate_limiter_scale.pyJWT_SECRET_KEY default updated to match .env.example.

Verification

Run Result
pytest tests/integration/test_rate_limiter_multi_tenant.py --with-integration ✅ 2/2
pytest tests/integration/test_rate_limiter_dynamic_behavior.py --with-integration ✅ 5/5
make benchmark-rate-limiter ✅ tools-path enforcement under sustained load — Redis backend verdict
make benchmark-rate-limiter-redis-capacity ✅ prompt-path sustains concurrent load with 0 errors

Run against a docker-compose stack with the rate limiter enabled (mode: enforce in plugins/config.yaml) and REDIS_CONTAINER_NAME set to match the running compose project name. The mode: enforce flip is operator-side test setup, not a chart change — plugins/config.yaml is unchanged in this PR.

Out of scope

  • tests/integration/test_rate_limiter.py — 50/59 substantive tests still pass. The 9 failures are scaffolding bit-rot (renamed /api/v1/... routes, removed _store introspection attribute the Rust core no longer exposes). The plugin engine itself is well-covered by the passing tests; happy to address the bit-rot in a follow-up if it's worth a separate cleanup.
  • The dynamic plugin-bindings path (POST /v1/tools/plugin_bindings) — there's a separate observation that for RateLimiterPlugin specifically, binding config overrides aren't honored at runtime even though the same path works for OutputLengthGuardPlugin and SecretsDetection. Diagnosing that is its own thread; this PR keeps to the static-config test surface.

…actor

The cpex framework refactor on main changed two contracts the existing
rate-limiter tests didn't track:

  1. POST /servers/<id>/mcp now requires an Mcp-Session-Id header on
     every non-initialize call. Tests that hit the tool path silently
     returned HTTP 400; load tests that auto-detected tools via
     tools/list left their tool list empty and the @task no-op'd.

  2. GET /admin/plugins now returns the framework's internal mode
     label ("sequential" / "transform" / "disabled") rather than the
     operator label ("enforce" / "permissive" / "disabled") set via
     PUT. One assertion compared against the operator label and broke.

Plus a smaller drift fix in two locustfiles: the JWT_SECRET_KEY
default ("my-test-key", 11 chars) didn't match the gateway's
.env.example value ("my-test-key-but-now-longer-than-32-bytes"), so
requests 401'd before generating any traffic.

Changes
- tests/integration/test_rate_limiter_multi_tenant.py: add
  _mcp_initialize_session helper; _invoke_tool_once now does the
  initialize + initialized handshake and sends Mcp-Session-Id.
- tests/integration/test_rate_limiter_dynamic_behavior.py: same
  handshake helper added to _send_tool_burst; admin-API mode
  assertion relaxed to "reported mode is non-disabled" with a
  comment explaining the operator-vs-internal label mapping.
- tests/loadtest/locustfile_rate_limiter_backend_correctness.py:
  _auto_detect runs the MCP initialize handshake before tools/list
  so _tool_names actually populates.
- tests/loadtest/locustfile_rate_limiter_{redis_capacity,scale}.py:
  JWT_SECRET_KEY default updated to match main's .env.example value.

Verified
  pytest tests/integration/test_rate_limiter_multi_tenant.py     -- 2/2
  pytest tests/integration/test_rate_limiter_dynamic_behavior.py -- 5/5
  make benchmark-rate-limiter                -- 119/120 blocked (98.3%)
  make benchmark-rate-limiter-redis-capacity -- 290 reqs, 0 errors

Out of scope
- tests/integration/test_rate_limiter.py: 50/59 still pass; 9 failures
  are scaffolding bit-rot (renamed /api/v1/... routes, removed _store
  attribute the Rust core no longer exposes). Substantive plugin
  behaviour is covered.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear problem statement and the summary table — the three contract breaks are well-diagnosed and the fix direction is right. A few issues need to be addressed before this lands.

Blocking

R1 — _mcp_initialize_session copy-pasted verbatim into two files

tests/integration/test_rate_limiter_dynamic_behavior.py and tests/integration/test_rate_limiter_multi_tenant.py contain near-identical implementations of _mcp_initialize_session. An equivalent _initialize_session already exists in tests/live_gateway/mcp/test_mcp_plugin_parity.py:170. Please extract this to tests/integration/helpers/ (or tests/integration/conftest.py) and import from both files — the next integration test that needs the handshake will copy-paste again otherwise.

R2 — Wrong protocol version "2025-06-18" — inconsistent with codebase canonical

The canonical protocol version across this repo is "2025-11-25" (set as MCP_PROTOCOL_VERSION in tests/live_gateway/mcp/test_mcp_plugin_parity.py:29 and tests/loadtest/locustfile_echo_delay.py:108). Both new integration helpers use "2025-06-18". Worse, locustfile_rate_limiter_backend_correctness.py now has two different versions in the same file — "2025-06-18" in the new _auto_detect block and "2024-11-05" in the existing call_tool task. Please use a shared MCP_PROTOCOL_VERSION = "2025-11-25" constant, import it everywhere, and explain in a comment if "2025-06-18" is intentional (e.g. backward-compat testing).

R3 — Mode assertion relaxed too far

# Before (correct contract, wrong label):
assert plugins[PLUGIN_NAME]["mode"] == "enforce"

# After (passes for any non-disabled state — too permissive):
assert reported_mode != "disabled"

The right fix is to make the operator→internal label mapping explicit:

OPERATOR_TO_INTERNAL_MODE = {"enforce": "sequential", "permissive": "transform", "disabled": "disabled"}
assert reported_mode == OPERATOR_TO_INTERNAL_MODE["enforce"], (
    f"Expected internal mode {OPERATOR_TO_INTERNAL_MODE['enforce']!r}, got {reported_mode!r}"
)

This gives a precise assertion, documents the mapping where it's used, and fails correctly if the framework relabels modes again.

P1 — _invoke_tool_once creates a new MCP session per call — semantic error for rate-limit tests

_invoke_tool_once runs the full initialize + initialized handshake before every single tool invocation. If the rate limiter keys on session ID or resets counters per session, each call in test_tool_invocation_creates_rate_limit_keys_in_redis and test_rate_limit_keys_carry_tenant_prefix_when_tool_is_team_owned operates in a fresh window rather than accumulating against the same client. The session should be established once in the server_and_tool fixture and passed in, not re-created per call.

S1 — Hardcoded fallback JWT secret still in source

JWT_SECRET_KEY = _cfg("JWT_SECRET_KEY", "my-test-key-but-now-longer-than-32-bytes")

The old key was wrong for a different reason; the fix still bakes a known secret into source. For load tests run against real infra a missing env var silently uses this string, and any JWT signed with it is trivially forgeable. Consider:

_raw = _cfg("JWT_SECRET_KEY", "")
if not _raw:
    raise RuntimeError("JWT_SECRET_KEY env var is required for load tests")
JWT_SECRET_KEY = _raw

Or at minimum add a clear # nosec annotation and a comment explaining this is a CI-only fallback that must never reach production.

L1/L2 — Silent error swallowing in both helpers

Both _mcp_initialize_session implementations:

except requests.RequestException:
    pass  # ← no log
return sid  # returned as valid even if notification step failed

The notifications/initialized failure is swallowed silently. A failed notification with a returned sid means subsequent calls may fail with opaque errors. Add at minimum:

except requests.RequestException as exc:
    logger.warning("MCP initialized notification failed for server %s: %s", server_id, exc)

Also: all four return None paths in both helpers produce no log output. The caller then reports status=0 or errors=count with nothing in the output to explain why. Add a logger.warning before each return None.


Non-blocking suggestions

R4 — return 0 is a confusing error sentinel

_invoke_tool_once returns 0 when sid is None. The caller asserts status == 200, giving AssertionError: assert 0 == 200 with no explanation. Use pytest.fail("MCP session handshake failed") or raise a descriptive exception.

R5 — notifications/initialized missing in _auto_detect

The locustfile _auto_detect gets the sid from the initialize response but never sends the notifications/initialized POST. The integration tests do send it. Worth aligning for protocol correctness.

T1 — 9 failing tests in test_rate_limiter.py

Acknowledged as out of scope — worth opening a follow-up issue and linking it here so it doesn't get lost.

T2 — No negative-path coverage for session handshake failure

No test exercises _mcp_initialize_session returning None — what _send_tool_burst reports in that case, and whether the test suite distinguishes "rate limited" from "couldn't connect at all".

T3 — No cross-tenant session replay test

After adding session handshake, worth adding a test that a session ID from tenant A cannot be used on tenant B's endpoint (isolation regression).

D1 — tests/AGENTS.md not updated

The session handshake is now a required protocol step for any integration test on the MCP tool path. Worth documenting there.

@msureshkumar88 msureshkumar88 self-assigned this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants