feat: add Rust A2A 1.0 runtime support by crivetimihai · Pull Request #3704 · IBM/mcp-context-forge

crivetimihai · 2026-03-17T09:28:37Z

Summary

add an experimental Rust A2A sidecar and gateway delegation path for registered A2A agents
normalize A2A v1 and legacy JSON-RPC payloads through a shared outbound protocol adapter
move the example and compose testing path to A2A 1.0.0 by default while retaining legacy interop

Details

add tools_rust/a2a_runtime with health and invoke endpoints, plus container and entrypoint wiring for RUST_A2A_MODE=off|shadow|edge|full
route A2A service and A2A tool invocations through shared request preparation and optional Rust-side delegation
rewrite the Go echo agent to serve a v1 card and task surface without the old Go SDK dependency, while still accepting legacy aliases
replace the integration fixture with direct v1 coverage and update load and compose flows to exercise the v1 default path through the managed Rust sidecar

Refs #1624

lucarlig · 2026-03-31T08:13:19Z

was working on same PR last month check if anything is useful and worth stealing from it #3250 so we can close it

dima-zakharov · 2026-04-01T13:26:09Z

The implementation of the protocol adapter and the managed lifecycle looks very solid.

That said, the idea of Python serving requests and processing them with Rust feels less efficient than the opposite approach. We should look at the Nginx Unit: a native listener handles the socket and routes traffic to Python as a service.

In that configuration, you can easily specify how many Python workers to start for the endpoints. The Nginx Unit architecture usually provides the fastest RPS—outperforming multi-worker setups like Uvicorn, Gunicorn, or Granian.
The current sidecar MCP runtime design follows similar approach like Nginx UNit, but we can't really run multiple Python workers effectively on UDS unless we use HTTP with something like Unit or other asgi servers anyway.

lucarlig · 2026-04-08T14:04:21Z

This is a strong piece of work overall. The Rust A2A runtime direction, protocol normalization work, and sidecar boundary are all promising. I do think there are a few high-priority issues worth addressing before merge:

The Rust A2A entrypoint appears to authorize every JSON-RPC method as invoke before branching into task reads, card lookups, and push-config operations. That means the new per-method A2A authz endpoints do not seem to be exercised, so RBAC behavior for read-oriented methods looks weaker than intended.
The new trusted agent-resolve path appears to bypass agent visibility scoping. The existing Python invoke flow checks access before returning an agent, but the internal Rust resolve path looks like it can return any enabled agent by name without consulting forwarded auth context or token teams. That seems likely to expose private or team-scoped agents through the sidecar path.
The task proxy path looks like it forwards the wrong parameter shape. The Rust side passes raw JSON-RPC params through to the internal Python task endpoints, but those endpoints expect gateway-style task identifiers, while the protocol normalization and integration coverage in this PR use spec-shaped task params. As written, valid task requests may fail on the Rust path.
The new queue implementation does not appear to deliver the concurrency it advertises. The batching path acquires a semaphore permit but still awaits each invoke inline before moving to the next request group, so work stays effectively serial per worker thread. That looks like a throughput regression in the new hot path.
The default queue mode appears to be unbounded. With no explicit queue limit configured, submissions can continue accumulating even after the bridge channel is saturated, which makes overload scenarios look vulnerable to unbounded memory growth.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

- Remove redundant local `from mcpgateway.config import settings` re-imports in a2a_service.py (W0404/W0621) - Add pylint disable for unused `interaction_type` parameter in build_a2a_jsonrpc_request (W0613) - part of public API - Add missing docstrings for interrogate coverage (8 functions) - Update test patches to target module-level settings binding at mcpgateway.services.a2a_service.settings - Redirect decode_auth/apply_query_param_auth test patches to mcpgateway.services.a2a_protocol where prepare_a2a_invocation now calls them - Update admin test assertion for simplified test_params format Signed-off-by: Jon Spriggs <github@sprig.gs> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

- Restore query-param credential redaction in error messages by exposing sensitive_query_param_names on PreparedA2AInvocation - Route Rust sidecar 504 (upstream timeout) through the timeout accounting path (counter, circuit-breaker, ToolTimeoutError) - Disable redirect following in the Rust reqwest client to match the Python path and prevent credential forwarding on redirects - Handle v/V-prefixed protocol version strings (e.g. "v1", "V1.0") so they correctly resolve to A2A v1 semantics - Add full docstrings to a2a_protocol.py and rust_a2a_runtime.py (resolves flake8 DAR101/DAR201/DAR401 warnings) - Add version.py A2A runtime diagnostic docstrings - Fix 6 test mock targets for module-level settings import - Add 100% coverage tests for a2a_protocol.py, rust_a2a_runtime.py, and version.py A2A diagnostics - Update .secrets.baseline for test fixture false positives Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

…, and caching Production-grade Rust A2A sidecar with full v1.0 method dispatch, SSE streaming, Redis-backed sessions, three-layer caching, and virtual server federation. Implements all 20 items from the A2A implementation plan across 5 phases. Core engine (17 Rust modules, 122 tests): - Circuit breaker with per-endpoint per-tenant isolation - Lock-free per-agent metrics with P95 adaptive timeouts - Bounded job queue with request coalescing - AES-GCM auth decryption matching Python services_auth contract - Trust validation via loopback + HMAC shared secret A2A v1.0 method dispatch: - SendMessage, SendStreamingMessage (SSE with event store + reconnect) - GetTask, ListTasks, CancelTask (proxied to Python RPC) - GetExtendedAgentCard (cached agent cards) - Push notification CRUD (4 methods) + Rust webhook dispatch - All methods accept both v1 (PascalCase) and legacy (slash) names Streaming and sessions: - SSE passthrough from agent through Rust to client - Redis ring buffer event store (Lua atomic scripts) + async PG flush - Client reconnect via Last-Event-ID with Redis/PG replay - Redis-backed session management with auth fingerprinting and TTL - Session fast-path skips Python authenticate on reuse Three-layer caching: - L1 in-process DashMap, L2 Redis, L3 PG via Python RPC - Redis pub/sub invalidation for cross-worker L1 eviction - Graceful degradation when Redis unavailable Infrastructure: - Nginx a2a_upstream with Rust-primary/Python-backup failover - 15 Python /_internal/a2a/* RPC endpoints - Virtual server federation (servers exposed as A2A agents) - Shadow mode dual-dispatch with response comparison logging - Progressive rollout via RUST_A2A_MODE=off|shadow|edge|full Domain models: - A2ATask, ServerTaskMapping, ServerInterface, A2AAgentAuth, A2APushNotificationConfig, A2ATaskEvent (6 models, 3 migrations) - ADR-048 (A2A v1 protocol migration), ADR-049 (multi-protocol servers) Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

Add unit tests for previously untested modules and integration tests for error handling edge cases. Unit tests: - push.rs: webhook config filtering (enabled, events match, case-insensitive) - config.rs: listen_target parsing, UDS preference, invalid address, defaults Integration tests: - Trust chain: authentication failure (401→403), empty method fallthrough - Streaming: agent error (500), agent timeout (502/504) - Malformed requests: invalid JSON body, empty endpoint URL - Method routing: unknown method falls through to agent invoke Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

… endpoints Add 75 new Python tests covering previously untested A2A code: - A2AServerService (12 tests): get_server_agent_card, resolve_server_agent, select_downstream_agent, create/resolve_task_mapping - a2a_service new methods (24 tests): cancel_task, push config CRUD, flush/replay events, shadow mode comparison, invalidation publishing - /_internal/a2a/* endpoints (39 tests): untrusted 403 gate for all 16 endpoints, happy-path delegation for tasks, push, events, resolve, card Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

Add title column to tools, resources, and prompts tables following MCP 2025-11-25 spec precedence (title → annotations.title → name). Implement _resolve_tool_title() helper with comprehensive doctests. SSO enhancements: - Add ADFS integration with tutorial documentation - Expand SSO service test coverage (775+ new test lines) - Add integration tests for ADFS flows Performance improvements: - Optimize gateway listings with eager-loaded tool counts - Add selectinload for tools relationship in admin queries Testing: - Add unit tests for demo_a2a_agent.py script - Expand gateway_service and sso_service test coverage - Add ADFS integration test suite Documentation: - Add comprehensive ADFS SSO tutorial - Update A2A agent documentation - Remove duplicate config.schema.json files Database: - Migration a7f3c9e1b2d4: add title column (idempotent) - Update schemas to include title field validation Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

- Scope agent cache keys by caller's teams to prevent cross-tenant cache poisoning in the Rust A2A runtime - Add visibility scoping to task get/list endpoints so callers only see tasks owned by agents they have access to - Check agent access before building agent card to avoid loading sensitive data for unauthorized callers - Truncate error detail in Rust runtime client logging to avoid leaking auth blobs - Tighten protocol matching in A2AServerService to exact match instead of substring contains - Validate task_id/agent_id/state parameter types in task proxy endpoints - Fix stale migration docstring (Revises: field) - Fix clippy warnings (single_match, too_many_arguments) - Run cargo fmt, isort, and black on PR-touched files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

…ints Task cancel, push config CRUD (create/get/list/delete), and events replay endpoints were missing scope-aware authorization checks. A caller could operate on tasks and configs belonging to agents outside their team visibility. - cancel_task now checks agent visibility before allowing cancellation - Push config create/get/list/delete endpoints verify the caller can access the owning agent via _check_agent_access_by_id - Events replay checks the task's owning agent visibility before returning event data - Events flush is left unscoped (trusted sidecar write path, not a read path exposing data to callers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

…bility query - Add single-flight deduplication to resolve_agent: concurrent cache misses for the same scoped agent key now share one Python resolve call instead of each hitting the backend independently - Wrap ResolvedRequest in Arc in the job queue to avoid deep-cloning headers and JSON body on every request submission - Rewrite _visible_agent_ids to push visibility filtering into SQL instead of loading all enabled agents into Python and filtering in a loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

…dation Security fixes: - Add authentication to the /invoke endpoint (HMAC trust header gate) - Add authentication to the A2A proxy catch-all before forwarding - Require non-empty A2A_RUST_AUTH_SECRET at startup to prevent predictable trust headers - Use server-derived owner_email/team_id in agent registration instead of trusting client-provided values - Add visibility scoping to the flush_events internal endpoint - Close visibility bypass on events/replay when task row is absent - Close visibility bypass on push config get/list when agent_id omitted - Validate webhook URL through Pydantic schema (SSRF protection) in internal push config creation endpoint - Add webhook URL validation via validate_core_url on A2APushNotificationConfigCreate schema - Exclude auth_token from A2APushNotificationConfigRead (Field exclude) - Change _check_agent_access_by_id to fail-closed for deleted agents - Make encode_auth_context a hard error instead of silent empty fallback - Sanitize all error responses to external callers — strip internal backend URLs, Python response bodies, and reqwest error details - Redact Redis URL credentials in logs - Use constant-time comparison for session fingerprint validation - Warn at startup if HTTP listener is non-loopback Correctness fixes: - Pass actual JSON-RPC method name through full_authenticate - Add explicit authz match arms for cancel, delete, create, and stream - Remove x-forwarded-for from default session fingerprint headers - Fix cargo fmt conflicts with pragma allowlist comments - Cap retry backoff at 60 seconds to prevent runaway delays Error handling improvements: - Replace expect() in queue worker with graceful error handling for semaphore closure, panicked JoinSet tasks, and missing results - Add logger.exception + db.invalidate fallback to all 11 internal A2A endpoint exception handlers - Upgrade query param decryption failure log from debug to warning - Log warning on auth decoding failure during agent update - Return None from event_store.store_event on serialization failure - Replace .ok() with logged match on proxy response JSON parsing - Replace mutex expect("poisoned") with unwrap_or_else recovery - Upgrade shadow mode payload mismatch log to warning with traceback - Upgrade Redis cache invalidation failure log to warning - Bound resolve_inflight DashMap to prevent unbounded memory growth Comment and code cleanup: - Fix handle_a2a_proxy doc (does not inject trust headers) - Fix coalesce_jobs doc (does not match on timeout) - Fix proxy_task_method doc (also handles cancel, not just reads) - Clarify trust header as keyed SHA-256, not formal HMAC - Renumber handle_a2a_invoke steps sequentially (1-2-3-3a-4-5) - Document _visible_agent_ids admin-bypass divergence - Remove leftover logger.info debug statements from register_agent - Remove commented-out dead code from register_agent - Remove duplicate TOOLS_MANAGE_PLUGINS constant - Remove constant-comparison test (test_version_endpoint_redis_conditions) Test coverage: - Add visibility tests for cancel_task (wrong team, admin, public-only) - Add deny-path tests for events/replay (inaccessible agent, missing task) - Add 32 tests for _check_agent_access_by_id, _visible_agent_ids, get_task, list_tasks, and _check_server_access - Add tests for auth_secret startup rejection (unit + binary) - Update integration tests for trust-gated /invoke and authenticated proxy endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

… type design Security-critical: - Resolve A2A task lookup ambiguity: Rust runtime pre-resolves agent from URL path and injects agent_id into task-method proxy params; Python service refuses to guess when (task_id, no agent_id) matches multiple rows. Prevents cross-agent task read/cancel via shared task_id. - Encrypt push webhook auth_token at rest via services_auth.encode_auth; new list_push_configs_for_dispatch returns decrypted tokens for the trusted Rust sidecar only. Fixes secret-at-rest leak. - Upsert semantics on push-config re-registration: mutable fields (auth_token, events, enabled) update in place instead of silently keeping stale config. Key rotations now take effect. - Fail-closed on query-param auth decrypt failure (matches header path). - Refuse empty/missing A2A_RUST_AUTH_SECRET at startup; add cross-field RuntimeConfig validation (retry budget, TTL ordering, session TTL). - Feature-flag gate on internal A2A endpoints: reject trusted requests when MCPGATEWAY_A2A_ENABLED=false (defense-in-depth). - events/flush rejects events for unknown task_ids (400) to prevent visibility bypass. Observability: - logger.exception on _authorize_internal_a2a_method errors. - logger.warning on agent-visibility denial across 7 internal endpoints. - warn! on JSON parse failures in 4 proxy sites (server.rs) and malformed Last-Event-ID headers. - warn! on session fingerprint mismatch (security-relevant signal). - MetricsCollector.webhook_retry_exhausted counter + record method. - Log Rust-cache invalidation scheduling failures instead of silent pass. - Rollback-after-failure in task persistence now logs and calls invalidate. Type design / schema: - New A2ATaskState enum with terminal-state validator on A2ATaskUpdate. - Unknown task-state protocol values now warn (passed through for forward-compat). - Alembic: tenant=String(255), icon_url=String(767) match the ORM (prevents Postgres VARCHAR-length drift between fresh and migrated DBs). - Rename two A2A migration files to hash-first convention matching the rest of the project. - trust.rs doc/comment clarifies the header is SHA256, not HMAC. - BTreeMap for query-param merge ensures deterministic ordering. - encode_auth_context uses .expect() — empty header would look anonymous. Test coverage: - Deny-path regressions for internal A2A endpoints: RBAC denied, wrong-team, feature-disabled (20 new cases). - Rust↔Python bridge: ConnectError/ConnectTimeout/ReadTimeout/PoolTimeout now wrap as RustA2ARuntimeError with correct is_timeout flag. - Push config: encryption round-trip, token rotation, upsert semantics, undecryptable-legacy heal path, dispatch-listing SQL visibility scope. - Rust concurrency: session concurrent lookup/invalidate leaves storage consistent; concurrent create produces unique IDs; queue overflow under concurrent producers rejects with Full. - RuntimeConfig cross-field validation (5 scenarios). - Rust push: webhook retry-exhaustion metric increments. - Binary startup refuses without auth_secret. - select_downstream_agent: ORDER BY name + enabled filter assertions. Deferred to follow-up issues: - Typestate wrappers for trust-boundary types (#4233) - SSE client-disconnect integration test (#4234) Signed-off-by: Jonathan Springer <jps@s390x.com>

crivetimihai requested review from kevalmahajan and madhav165 as code owners March 17, 2026 09:28

crivetimihai added experimental Experimental features, test proposed MCP Specification changes rust Rust programming labels Mar 17, 2026

crivetimihai self-assigned this Mar 17, 2026

crivetimihai added the COULD P3: Nice-to-have features with minimal impact if left out; included if time permits label Mar 20, 2026

crivetimihai added this to the Release 1.1.0 milestone Mar 20, 2026

crivetimihai force-pushed the a2a-rust branch from ea23afa to 53441cc Compare March 31, 2026 07:50

crivetimihai requested review from dima-zakharov and lucarlig as code owners March 31, 2026 07:50

jonpspri added the a2a Support for A2A protocol label Apr 1, 2026

jonpspri force-pushed the a2a-rust branch from 53441cc to d355829 Compare April 7, 2026 11:09

This was referenced Apr 7, 2026

[EPIC][A2A]: A2A Protocol v0.3.0 Full Compliance Implementation #3150

Closed

[FEATURE][RUST]: Rewrite A2A agent in Rust (#1624) #3250

Closed

jonpspri force-pushed the a2a-rust branch 4 times, most recently from 22672db to ed7413c Compare April 8, 2026 13:51

jonpspri self-assigned this Apr 12, 2026

jonpspri force-pushed the a2a-rust branch from bf87bc4 to 373fef6 Compare April 13, 2026 06:40

jonpspri mentioned this pull request Apr 13, 2026

[EPIC][A2A]: A2A Protocol v0.3.0 Full Compliance Implementation #2547

Closed

jonpspri force-pushed the a2a-rust branch from 373fef6 to 35f71c8 Compare April 13, 2026 12:31

brian-hussey self-assigned this Apr 15, 2026

lucarlig force-pushed the a2a-rust branch from 35f71c8 to a7b4ee5 Compare April 15, 2026 10:24

lucarlig requested review from brian-hussey and dawid-nowak as code owners April 15, 2026 10:24

lucarlig force-pushed the a2a-rust branch from a7b4ee5 to 35f71c8 Compare April 15, 2026 10:29

lucarlig force-pushed the a2a-rust branch from 44f0b01 to 8f55bc8 Compare April 15, 2026 14:16

lucarlig self-assigned this Apr 15, 2026

lucarlig force-pushed the a2a-rust branch from d706e11 to 16c2f08 Compare April 15, 2026 14:51

crivetimihai and others added 18 commits April 15, 2026 18:20

feat: add Rust A2A 1.0 runtime support

8a7bc66

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

fix: tighten A2A runtime authz and queue handling

13b773f

Signed-off-by: lucarlig <luca.carlig@ibm.com>

test: expand rust a2a runtime coverage

3cf618b

Signed-off-by: lucarlig <luca.carlig@ibm.com>

test: tighten rust a2a coverage checks

87dac4a

Signed-off-by: lucarlig <luca.carlig@ibm.com>

refactor: move a2a runtime into crates workspace

d53e022

Signed-off-by: lucarlig <luca.carlig@ibm.com>

fix: resolve remaining a2a ci failures

ac954b9

Signed-off-by: lucarlig <luca.carlig@ibm.com>

chore: update rust supply-chain vet exemptions

45f0882

Signed-off-by: lucarlig <luca.carlig@ibm.com>

test: fix a2a diff-cover regressions

dddd41d

Signed-off-by: lucarlig <luca.carlig@ibm.com>

This was referenced Apr 15, 2026

[FEATURE]: Typestate wrappers for A2A runtime trust-boundary types #4233

Open

[TESTING]: Integration test for SSE client-disconnect cancellation in Rust A2A runtime #4234

Open

jonpspri force-pushed the a2a-rust branch from 9703b5a to a7d21ac Compare April 15, 2026 18:57

jonpspri force-pushed the a2a-rust branch from a7d21ac to 522cc1f Compare April 15, 2026 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Rust A2A 1.0 runtime support#3704

feat: add Rust A2A 1.0 runtime support#3704
crivetimihai wants to merge 19 commits intomainfrom
a2a-rust

crivetimihai commented Mar 17, 2026 •

edited by jonpspri

Loading

Uh oh!

lucarlig commented Mar 31, 2026 •

edited

Loading

Uh oh!

dima-zakharov commented Apr 1, 2026

Uh oh!

lucarlig commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

crivetimihai commented Mar 17, 2026 • edited by jonpspri Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Uh oh!

lucarlig commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dima-zakharov commented Apr 1, 2026

Uh oh!

lucarlig commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crivetimihai commented Mar 17, 2026 •

edited by jonpspri

Loading

lucarlig commented Mar 31, 2026 •

edited

Loading

lucarlig commented Apr 8, 2026 •

edited

Loading