Summary
The dynamic plugin configuration and per-tool bindings added in #4068 / #4143 are the supported way to manage plugins at runtime. This issue covers two pillars of plugin runtime management:
- Global plugin enable/disable — toggle any plugin on/off at runtime via the API, verify it takes effect across all gateway instances
- RateLimiterPlugin dynamic configuration — change rate limiter settings at runtime (limits, mode, scope), verify the new config takes effect across all instances
Currently there are no tests that verify either capability works correctly across multiple gateway instances. In a 3-pod deployment, a change made via the API on pod A must be visible to pods B and C — and we don't currently test or guarantee that.
This issue covers auditing the propagation mechanism, writing tests that expose gaps, and fixing any bugs found.
Context
- The
enable_plugins() function in mcpgateway/plugins/framework/__init__.py is a per-process in-memory toggle — it does not propagate across pods
- The plugin config loader (
mcpgateway/plugins/framework/loader/config.py) supports Jinja2 env var resolution but reads from a static YAML file at startup
- Tool plugin bindings (
/v1/tools/plugin_bindings) persist to Postgres — but it's unclear whether the plugin manager reads these per-request or caches them in-memory
- Rate limiter uses Redis as shared state for rate counting (works cross-pod), but the plugin enable/disable and config state may not be shared
Related: #4222 (SQL Sanitizer e2e tests), PII Filter e2e tests — both depend on the plugin runtime management infrastructure being reliable across instances.
Approach
Test-driven hardening — write the tests first, let failures expose bugs, fix the bugs, tests become the proof.
Deliverables
1. Audit — map the state flow
Trace the plugin runtime management path end-to-end:
- API call → DB write → plugin manager reload → request pipeline
- Identify where state is per-process (in-memory) vs shared (DB/Redis)
- Document the cache TTLs and invalidation mechanisms (if any)
- Specifically: when a binding is created/updated via
/v1/tools/plugin_bindings, how does each pod learn about it?
2. Integration tests (tests/integration/test_plugin_runtime_management.py)
Against a live gateway (not mocked):
Global enable/disable:
- Enable a plugin via API, verify it takes effect on subsequent requests
- Disable a plugin via API, verify it stops running
- Per-tool binding: bind plugin to tool A only, call tool A (plugin runs), call tool B (plugin doesn't)
- Binding deletion: remove binding, verify plugin stops running
RateLimiterPlugin dynamic configuration:
- Change rate limit from 30/m to 100/m via API, verify the new limit is enforced
- Switch mode from permissive to enforce via API, verify enforcement on subsequent requests
- Change
by_user vs by_tenant settings, verify the correct scope is applied
Multi-instance validation:
- Make API call to one pod, verify behaviour on a different pod (the core test)
- Pod restart survival: create binding + config change, delete a pod, verify both persist after recreate
3. Load test (tests/loadtest/locustfile_plugin_runtime_management.py)
Locust file for multi-instance environments:
- Admin user class: periodically enables/disables plugins and changes rate limits via the API
- Normal user class (100+): continuously calls tools and verifies plugins are active/inactive and rate limits match the current config
- Success criteria: 0% inconsistency across all pods under concurrent load
4. Bug fixes
When tests fail, fix the underlying code. Possible fixes:
- Add DB/Redis read for dynamic bindings on each request (if currently cached in-memory only)
- Add Redis pub/sub or cache invalidation for plugin state/config changes
- Add cache TTL configuration for plugin manager state
- Ensure dynamic config changes propagate to all Gunicorn workers within a pod (not just the one handling the API call)
- Ensure
fail_on_plugin_error behaves correctly with dynamically enabled plugins
Each fix ships with the test that proves it works.
5. E2E validation
Run the full test suite on a multi-instance deployment:
tests/integration/ via pytest against the gateway
tests/loadtest/ via Locust with multiple workers
- Document results and any remaining limitations
Priority order
- Single-instance global enable/disable (does the API actually change runtime behaviour?)
- Single-instance RateLimiterPlugin dynamic config (does changing settings take effect?)
- Multi-instance propagation (does pod B see pod A's change?)
- Persistence across pod restarts
- Concurrency under load
- Security (can a non-admin user toggle plugins or change rate limits?)
Environments
- Colima — single and multi-instance (docker-compose with
replicas: 3) for integration test development
- OCP — optional, for validation on a real cluster if available
Summary
The dynamic plugin configuration and per-tool bindings added in #4068 / #4143 are the supported way to manage plugins at runtime. This issue covers two pillars of plugin runtime management:
Currently there are no tests that verify either capability works correctly across multiple gateway instances. In a 3-pod deployment, a change made via the API on pod A must be visible to pods B and C — and we don't currently test or guarantee that.
This issue covers auditing the propagation mechanism, writing tests that expose gaps, and fixing any bugs found.
Context
enable_plugins()function inmcpgateway/plugins/framework/__init__.pyis a per-process in-memory toggle — it does not propagate across podsmcpgateway/plugins/framework/loader/config.py) supports Jinja2 env var resolution but reads from a static YAML file at startup/v1/tools/plugin_bindings) persist to Postgres — but it's unclear whether the plugin manager reads these per-request or caches them in-memoryRelated: #4222 (SQL Sanitizer e2e tests), PII Filter e2e tests — both depend on the plugin runtime management infrastructure being reliable across instances.
Approach
Test-driven hardening — write the tests first, let failures expose bugs, fix the bugs, tests become the proof.
Deliverables
1. Audit — map the state flow
Trace the plugin runtime management path end-to-end:
/v1/tools/plugin_bindings, how does each pod learn about it?2. Integration tests (
tests/integration/test_plugin_runtime_management.py)Against a live gateway (not mocked):
Global enable/disable:
RateLimiterPlugin dynamic configuration:
by_uservsby_tenantsettings, verify the correct scope is appliedMulti-instance validation:
3. Load test (
tests/loadtest/locustfile_plugin_runtime_management.py)Locust file for multi-instance environments:
4. Bug fixes
When tests fail, fix the underlying code. Possible fixes:
fail_on_plugin_errorbehaves correctly with dynamically enabled pluginsEach fix ships with the test that proves it works.
5. E2E validation
Run the full test suite on a multi-instance deployment:
tests/integration/via pytest against the gatewaytests/loadtest/via Locust with multiple workersPriority order
Environments
replicas: 3) for integration test development