Commit 685ebe5
feat: add Rust A2A 1.0 runtime support (#3704)
* feat: add Rust A2A 1.0 runtime support
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: resolve pylint warnings and update tests for A2A protocol refactor
- Remove redundant local `from mcpgateway.config import settings`
re-imports in a2a_service.py (W0404/W0621)
- Add pylint disable for unused `interaction_type` parameter in
build_a2a_jsonrpc_request (W0613) - part of public API
- Add missing docstrings for interrogate coverage (8 functions)
- Update test patches to target module-level settings binding at
mcpgateway.services.a2a_service.settings
- Redirect decode_auth/apply_query_param_auth test patches to
mcpgateway.services.a2a_protocol where prepare_a2a_invocation
now calls them
- Update admin test assertion for simplified test_params format
Signed-off-by: Jon Spriggs <github@sprig.gs>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: harden A2A protocol layer and Rust runtime after review
- Restore query-param credential redaction in error messages by exposing
sensitive_query_param_names on PreparedA2AInvocation
- Route Rust sidecar 504 (upstream timeout) through the timeout
accounting path (counter, circuit-breaker, ToolTimeoutError)
- Disable redirect following in the Rust reqwest client to match the
Python path and prevent credential forwarding on redirects
- Handle v/V-prefixed protocol version strings (e.g. "v1", "V1.0")
so they correctly resolve to A2A v1 semantics
- Add full docstrings to a2a_protocol.py and rust_a2a_runtime.py
(resolves flake8 DAR101/DAR201/DAR401 warnings)
- Add version.py A2A runtime diagnostic docstrings
- Fix 6 test mock targets for module-level settings import
- Add 100% coverage tests for a2a_protocol.py, rust_a2a_runtime.py,
and version.py A2A diagnostics
- Update .secrets.baseline for test fixture false positives
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* feat(a2a-runtime): complete A2A v1.0 runtime with streaming, sessions, and caching
Production-grade Rust A2A sidecar with full v1.0 method dispatch, SSE
streaming, Redis-backed sessions, three-layer caching, and virtual
server federation. Implements all 20 items from the A2A implementation
plan across 5 phases.
Core engine (17 Rust modules, 122 tests):
- Circuit breaker with per-endpoint per-tenant isolation
- Lock-free per-agent metrics with P95 adaptive timeouts
- Bounded job queue with request coalescing
- AES-GCM auth decryption matching Python services_auth contract
- Trust validation via loopback + HMAC shared secret
A2A v1.0 method dispatch:
- SendMessage, SendStreamingMessage (SSE with event store + reconnect)
- GetTask, ListTasks, CancelTask (proxied to Python RPC)
- GetExtendedAgentCard (cached agent cards)
- Push notification CRUD (4 methods) + Rust webhook dispatch
- All methods accept both v1 (PascalCase) and legacy (slash) names
Streaming and sessions:
- SSE passthrough from agent through Rust to client
- Redis ring buffer event store (Lua atomic scripts) + async PG flush
- Client reconnect via Last-Event-ID with Redis/PG replay
- Redis-backed session management with auth fingerprinting and TTL
- Session fast-path skips Python authenticate on reuse
Three-layer caching:
- L1 in-process DashMap, L2 Redis, L3 PG via Python RPC
- Redis pub/sub invalidation for cross-worker L1 eviction
- Graceful degradation when Redis unavailable
Infrastructure:
- Nginx a2a_upstream with Rust-primary/Python-backup failover
- 15 Python /_internal/a2a/* RPC endpoints
- Virtual server federation (servers exposed as A2A agents)
- Shadow mode dual-dispatch with response comparison logging
- Progressive rollout via RUST_A2A_MODE=off|shadow|edge|full
Domain models:
- A2ATask, ServerTaskMapping, ServerInterface, A2AAgentAuth,
A2APushNotificationConfig, A2ATaskEvent (6 models, 3 migrations)
- ADR-048 (A2A v1 protocol migration), ADR-049 (multi-protocol servers)
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* test(a2a): fill testing gaps in push, config, and integration edge cases
Add unit tests for previously untested modules and integration tests
for error handling edge cases.
Unit tests:
- push.rs: webhook config filtering (enabled, events match, case-insensitive)
- config.rs: listen_target parsing, UDS preference, invalid address, defaults
Integration tests:
- Trust chain: authentication failure (401→403), empty method fallthrough
- Streaming: agent error (500), agent timeout (502/504)
- Malformed requests: invalid JSON body, empty endpoint URL
- Method routing: unknown method falls through to agent invoke
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* test(a2a): add Python unit tests for federation, service methods, and endpoints
Add 75 new Python tests covering previously untested A2A code:
- A2AServerService (12 tests): get_server_agent_card, resolve_server_agent,
select_downstream_agent, create/resolve_task_mapping
- a2a_service new methods (24 tests): cancel_task, push config CRUD,
flush/replay events, shadow mode comparison, invalidation publishing
- /_internal/a2a/* endpoints (39 tests): untrusted 403 gate for all 16
endpoints, happy-path delegation for tasks, push, events, resolve, card
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* feat(core): add title field support and SSO ADFS integration
Add title column to tools, resources, and prompts tables following MCP
2025-11-25 spec precedence (title → annotations.title → name). Implement
_resolve_tool_title() helper with comprehensive doctests.
SSO enhancements:
- Add ADFS integration with tutorial documentation
- Expand SSO service test coverage (775+ new test lines)
- Add integration tests for ADFS flows
Performance improvements:
- Optimize gateway listings with eager-loaded tool counts
- Add selectinload for tools relationship in admin queries
Testing:
- Add unit tests for demo_a2a_agent.py script
- Expand gateway_service and sso_service test coverage
- Add ADFS integration test suite
Documentation:
- Add comprehensive ADFS SSO tutorial
- Update A2A agent documentation
- Remove duplicate config.schema.json files
Database:
- Migration a7f3c9e1b2d4: add title column (idempotent)
- Update schemas to include title field validation
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: tighten A2A runtime authz and queue handling
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* test: expand rust a2a runtime coverage
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* test: tighten rust a2a coverage checks
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: harden A2A runtime security and fix lint issues
- Scope agent cache keys by caller's teams to prevent cross-tenant
cache poisoning in the Rust A2A runtime
- Add visibility scoping to task get/list endpoints so callers only
see tasks owned by agents they have access to
- Check agent access before building agent card to avoid loading
sensitive data for unauthorized callers
- Truncate error detail in Rust runtime client logging to avoid
leaking auth blobs
- Tighten protocol matching in A2AServerService to exact match
instead of substring contains
- Validate task_id/agent_id/state parameter types in task proxy
endpoints
- Fix stale migration docstring (Revises: field)
- Fix clippy warnings (single_match, too_many_arguments)
- Run cargo fmt, isort, and black on PR-touched files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix(security): add visibility scoping to remaining internal A2A endpoints
Task cancel, push config CRUD (create/get/list/delete), and events
replay endpoints were missing scope-aware authorization checks. A
caller could operate on tasks and configs belonging to agents outside
their team visibility.
- cancel_task now checks agent visibility before allowing cancellation
- Push config create/get/list/delete endpoints verify the caller can
access the owning agent via _check_agent_access_by_id
- Events replay checks the task's owning agent visibility before
returning event data
- Events flush is left unscoped (trusted sidecar write path, not a
read path exposing data to callers)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* perf: fix cache stampede, reduce queue allocations, and optimize visibility query
- Add single-flight deduplication to resolve_agent: concurrent cache
misses for the same scoped agent key now share one Python resolve
call instead of each hitting the backend independently
- Wrap ResolvedRequest in Arc in the job queue to avoid deep-cloning
headers and JSON body on every request submission
- Rewrite _visible_agent_ids to push visibility filtering into SQL
instead of loading all enabled agents into Python and filtering
in a loop
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix(security): harden A2A runtime trust boundaries, scoping, and validation
Security fixes:
- Add authentication to the /invoke endpoint (HMAC trust header gate)
- Add authentication to the A2A proxy catch-all before forwarding
- Require non-empty A2A_RUST_AUTH_SECRET at startup to prevent
predictable trust headers
- Use server-derived owner_email/team_id in agent registration instead
of trusting client-provided values
- Add visibility scoping to the flush_events internal endpoint
- Close visibility bypass on events/replay when task row is absent
- Close visibility bypass on push config get/list when agent_id omitted
- Validate webhook URL through Pydantic schema (SSRF protection) in
internal push config creation endpoint
- Add webhook URL validation via validate_core_url on
A2APushNotificationConfigCreate schema
- Exclude auth_token from A2APushNotificationConfigRead (Field exclude)
- Change _check_agent_access_by_id to fail-closed for deleted agents
- Make encode_auth_context a hard error instead of silent empty fallback
- Sanitize all error responses to external callers — strip internal
backend URLs, Python response bodies, and reqwest error details
- Redact Redis URL credentials in logs
- Use constant-time comparison for session fingerprint validation
- Warn at startup if HTTP listener is non-loopback
Correctness fixes:
- Pass actual JSON-RPC method name through full_authenticate
- Add explicit authz match arms for cancel, delete, create, and stream
- Remove x-forwarded-for from default session fingerprint headers
- Fix cargo fmt conflicts with pragma allowlist comments
- Cap retry backoff at 60 seconds to prevent runaway delays
Error handling improvements:
- Replace expect() in queue worker with graceful error handling for
semaphore closure, panicked JoinSet tasks, and missing results
- Add logger.exception + db.invalidate fallback to all 11 internal
A2A endpoint exception handlers
- Upgrade query param decryption failure log from debug to warning
- Log warning on auth decoding failure during agent update
- Return None from event_store.store_event on serialization failure
- Replace .ok() with logged match on proxy response JSON parsing
- Replace mutex expect("poisoned") with unwrap_or_else recovery
- Upgrade shadow mode payload mismatch log to warning with traceback
- Upgrade Redis cache invalidation failure log to warning
- Bound resolve_inflight DashMap to prevent unbounded memory growth
Comment and code cleanup:
- Fix handle_a2a_proxy doc (does not inject trust headers)
- Fix coalesce_jobs doc (does not match on timeout)
- Fix proxy_task_method doc (also handles cancel, not just reads)
- Clarify trust header as keyed SHA-256, not formal HMAC
- Renumber handle_a2a_invoke steps sequentially (1-2-3-3a-4-5)
- Document _visible_agent_ids admin-bypass divergence
- Remove leftover logger.info debug statements from register_agent
- Remove commented-out dead code from register_agent
- Remove duplicate TOOLS_MANAGE_PLUGINS constant
- Remove constant-comparison test (test_version_endpoint_redis_conditions)
Test coverage:
- Add visibility tests for cancel_task (wrong team, admin, public-only)
- Add deny-path tests for events/replay (inaccessible agent, missing task)
- Add 32 tests for _check_agent_access_by_id, _visible_agent_ids,
get_task, list_tasks, and _check_server_access
- Add tests for auth_secret startup rejection (unit + binary)
- Update integration tests for trust-gated /invoke and authenticated
proxy endpoints
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* refactor: move a2a runtime into crates workspace
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: resolve remaining a2a ci failures
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* chore: update rust supply-chain vet exemptions
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* test: fix a2a diff-cover regressions
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix(a2a): address PR #3704 review findings — security, observability, type design
Security-critical:
- Resolve A2A task lookup ambiguity: Rust runtime pre-resolves agent from
URL path and injects agent_id into task-method proxy params; Python
service refuses to guess when (task_id, no agent_id) matches multiple
rows. Prevents cross-agent task read/cancel via shared task_id.
- Encrypt push webhook auth_token at rest via services_auth.encode_auth;
new list_push_configs_for_dispatch returns decrypted tokens for the
trusted Rust sidecar only. Fixes secret-at-rest leak.
- Upsert semantics on push-config re-registration: mutable fields
(auth_token, events, enabled) update in place instead of silently
keeping stale config. Key rotations now take effect.
- Fail-closed on query-param auth decrypt failure (matches header path).
- Refuse empty/missing A2A_RUST_AUTH_SECRET at startup; add cross-field
RuntimeConfig validation (retry budget, TTL ordering, session TTL).
- Feature-flag gate on internal A2A endpoints: reject trusted requests
when MCPGATEWAY_A2A_ENABLED=false (defense-in-depth).
- events/flush rejects events for unknown task_ids (400) to prevent
visibility bypass.
Observability:
- logger.exception on _authorize_internal_a2a_method errors.
- logger.warning on agent-visibility denial across 7 internal endpoints.
- warn! on JSON parse failures in 4 proxy sites (server.rs) and
malformed Last-Event-ID headers.
- warn! on session fingerprint mismatch (security-relevant signal).
- MetricsCollector.webhook_retry_exhausted counter + record method.
- Log Rust-cache invalidation scheduling failures instead of silent pass.
- Rollback-after-failure in task persistence now logs and calls invalidate.
Type design / schema:
- New A2ATaskState enum with terminal-state validator on A2ATaskUpdate.
- Unknown task-state protocol values now warn (passed through for
forward-compat).
- Alembic: tenant=String(255), icon_url=String(767) match the ORM
(prevents Postgres VARCHAR-length drift between fresh and migrated DBs).
- Rename two A2A migration files to hash-first convention matching the
rest of the project.
- trust.rs doc/comment clarifies the header is SHA256, not HMAC.
- BTreeMap for query-param merge ensures deterministic ordering.
- encode_auth_context uses .expect() — empty header would look anonymous.
Test coverage:
- Deny-path regressions for internal A2A endpoints: RBAC denied,
wrong-team, feature-disabled (20 new cases).
- Rust↔Python bridge: ConnectError/ConnectTimeout/ReadTimeout/PoolTimeout
now wrap as RustA2ARuntimeError with correct is_timeout flag.
- Push config: encryption round-trip, token rotation, upsert semantics,
undecryptable-legacy heal path, dispatch-listing SQL visibility scope.
- Rust concurrency: session concurrent lookup/invalidate leaves storage
consistent; concurrent create produces unique IDs; queue overflow
under concurrent producers rejects with Full.
- RuntimeConfig cross-field validation (5 scenarios).
- Rust push: webhook retry-exhaustion metric increments.
- Binary startup refuses without auth_secret.
- select_downstream_agent: ORDER BY name + enabled filter assertions.
Deferred to follow-up issues:
- Typestate wrappers for trust-boundary types (#4233)
- SSE client-disconnect integration test (#4234)
Signed-off-by: Jonathan Springer <jps@s390x.com>
* test: expand a2a runtime parity coverage
Signed-off-by: lucarlig <luca.carlig@ibm.com>
* fix: resolve rust ci regressions
Signed-off-by: lucarlig <luca.carlig@ibm.com>
---------
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: lucarlig <luca.carlig@ibm.com>1 parent c03ec24 commit 685ebe5
File tree
79 files changed
+25151
-1809
lines changed- a2a-agents/go/a2a-echo-agent
- charts/mcp-stack/templates
- crates
- a2a_runtime
- benches
- src
- tests
- mcp_runtime
- src
- tests
- wrapper
- scripts
- src
- tests
- docs/docs/architecture/adr
- infra/nginx
- mcpgateway
- alembic/versions
- services
- supply-chain
- tests
- integration
- loadtest
- unit
- mcpgateway
- services
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
79 files changed
+25151
-1809
lines changedLarge diffs are not rendered by default.
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| 34 | + | |
32 | 35 | | |
33 | 36 | | |
34 | | - | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
38 | | - | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
| |||
51 | 53 | | |
52 | 54 | | |
53 | 55 | | |
54 | | - | |
| 56 | + | |
| 57 | + | |
55 | 58 | | |
56 | 59 | | |
57 | | - | |
| 60 | + | |
58 | 61 | | |
59 | 62 | | |
60 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
| |||
128 | 132 | | |
129 | 133 | | |
130 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
131 | 143 | | |
132 | 144 | | |
133 | 145 | | |
| |||
203 | 215 | | |
204 | 216 | | |
205 | 217 | | |
| 218 | + | |
206 | 219 | | |
207 | 220 | | |
208 | 221 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8732 | 8732 | | |
8733 | 8733 | | |
8734 | 8734 | | |
| 8735 | + | |
| 8736 | + | |
| 8737 | + | |
| 8738 | + | |
| 8739 | + | |
| 8740 | + | |
| 8741 | + | |
| 8742 | + | |
| 8743 | + | |
| 8744 | + | |
| 8745 | + | |
| 8746 | + | |
8735 | 8747 | | |
8736 | 8748 | | |
8737 | 8749 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | 7 | | |
9 | 8 | | |
10 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
11 | 17 | | |
12 | 18 | | |
| 19 | + | |
13 | 20 | | |
14 | 21 | | |
15 | | - | |
| 22 | + | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
19 | 26 | | |
20 | 27 | | |
21 | 28 | | |
| 29 | + | |
22 | 30 | | |
23 | 31 | | |
24 | 32 | | |
25 | 33 | | |
26 | 34 | | |
27 | 35 | | |
28 | 36 | | |
29 | | - | |
| 37 | + | |
30 | 38 | | |
31 | 39 | | |
32 | 40 | | |
| |||
53 | 61 | | |
54 | 62 | | |
55 | 63 | | |
56 | | - | |
| 64 | + | |
57 | 65 | | |
58 | 66 | | |
59 | | - | |
60 | | - | |
| 67 | + | |
61 | 68 | | |
62 | | - | |
| 69 | + | |
63 | 70 | | |
64 | 71 | | |
65 | 72 | | |
| |||
0 commit comments