feat: Durable A2A (Agent2Agent) protocol — client, server, observability by v1r3n · Pull Request #1195 · conductor-oss/conductor

v1r3n · 2026-06-20T01:26:52Z

Summary

Adds Agent2Agent (A2A) protocol support to Conductor in both directions, with durability as the differentiator: a remote agent call survives a server crash/restart and resumes from persisted execution state.

Two directions

Client — call a remote agent from a workflow (conductor.integrations.ai.enabled=true)

AGENT (poll / streaming-SSE / push), GET_AGENT_CARD, CANCEL_AGENT.
Durable: deterministic messageId (stable across retries/restarts), in-flight state lives in the persisted task output (no held thread in poll mode), absolute-deadline + consecutive-poll-failure liveness guards, push backstop poll.

Server — expose any workflow as an A2A agent (conductor.a2a.server.enabled=true)

One agent per workflow at {basePath}/{workflow}: Agent Card + JSON-RPC message/send / tasks/get / tasks/cancel.
Idempotent start (messageId → idempotencyKey, RETURN_EXISTING) → server-side effectively-once.
Multi-turn resume: a follow-up message/send carrying the task id completes the paused HUMAN/WAIT task and resumes the same execution (no duplicate workflow).

Security

SSRF guard on outbound agentUrl (loopback / RFC-1918 / link-local / IPv6 ULA / cloud-metadata — metadata always blocked; redirects disabled); conductor.a2a.client.allow-private-network opt-in for trusted/dev. Optional server api-key (constant-time). Push callbacks use single-use bearer tokens with embedded 24h expiry.

Observability

Micrometer counters via the shared Monitors registry (a2a_client_calls, a2a_client_poll_failures, a2a_rpc_errors, a2a_ssrf_blocked, a2a_server_requests, a2a_server_resumes) + MDC correlation keys across A2A code paths.

Tests

Unit/wire (MockWebServer), embedded-agent e2e, push callback, server dispatch + multi-turn resume, mapper/worker, observability — 86 ai a2a tests.
A2ASdkInteropTest — drives the client against the official a2a-sdk reference agent (subprocess): discovery, send, poll, streaming, message-mode. Self-skips when a2a-sdk is unavailable.
A2ADurableEngineEndToEndTest (test-harness) — AGENT through the real decider + AsyncSystemTaskExecutor + Redis, proving crash/restart resume from persistence. Runs in the existing test-harness CI job (Docker/Redis).

Docs / UI / examples

docs/devguide/ai/a2a-integration.md (AI Cookbook nav) with worked examples: call/expose, multi-turn resume request/response, push end-to-end.
ai/examples/*a2a* (call / get-card / server / streaming / push / multi-turn / cancel) + indexed in the examples README; runnable interop and durable demos under ai/src/test/resources/a2a/.
ui-next: registered AGENT / GET_AGENT_CARD / CANCEL_AGENT (typecheck + build green).
design/a2a/: protocol study, durability proposal, server design.

⚠️ Follow-up needed (separate, needs `workflow` token scope)

To make A2ASdkInteropTest run in CI (it self-skips otherwise), add to the build job in .github/workflows/ci.yml after the JDK setup:

      - name: Set up Python (for the A2A interop test)
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install a2a-sdk (lets A2ASdkInteropTest run against the reference agent)
        run: pip install "a2a-sdk>=0.2,<0.3" uvicorn

(Couldn't be pushed here — the PAT lacks the workflow scope.)

Out of scope (documented follow-ups)

Server-side streaming (message/stream) + push-config endpoints; Agent Card JWS signing/verification (+ signatures/extensions passthrough); getAuthenticatedExtendedCard; per-skill OAuth scopes; gRPC / HTTP+JSON transports; full v1.0 ProtoJSON.

Verify locally

./gradlew :conductor-ai:test --tests '*a2a*'
A2A_PYTHON=<venv>/bin/python ./gradlew :conductor-ai:test --tests '*A2ASdkInteropTest'
./gradlew :conductor-test-harness:test --tests '*A2ADurableEngineEndToEndTest'

Note: inbound A2A server auth (api-key) was moved to the enterprise build; the OSS server is open by default (front with a gateway/firewall). Client→remote auth headers, SSRF guard, push-callback tokens, and cross-agent isolation remain in OSS.

Add Agent2Agent (A2A) protocol support to the AI module, in both directions. Client: CALL_AGENT / GET_AGENT_CARD / CANCEL_AGENT_TASK system tasks call remote A2A agents over JSON-RPC with poll, streaming (SSE), and push modes. Durable by design — deterministic messageId, state in the execution (not a thread), liveness guards, and a push backstop. SSRF-guarded (IPv6 ULA + cloud-metadata; redirects off). Server: expose any workflow as an A2A agent (one agent per workflow), idempotent message/send -> startWorkflow (RETURN_EXISTING), tasks/get / tasks/cancel, and multi-turn resume (a follow-up message/send completes the paused HUMAN/WAIT task instead of starting a duplicate). Observability: Micrometer counters via the shared Monitors registry and MDC correlation keys across the A2A code paths. Gated by conductor.integrations.ai.enabled (client) and conductor.a2a.server.enabled (server).

- Unit/wire tests (MockWebServer), embedded-agent e2e, push callback, server JSON-RPC dispatch + multi-turn resume, mapper/worker, and observability. - A2ASdkInteropTest: drives the client against the official a2a-sdk reference agent launched as a subprocess (discovery, send, poll, streaming, message-mode); self-skips when no Python with a2a-sdk is available. - A2ADurableEngineEndToEndTest (test-harness): CALL_AGENT through the real decider + AsyncSystemTaskExecutor + Redis, proving crash/restart resume from persistence.

- docs/devguide/ai/a2a-integration.md (AI Cookbook nav) with worked examples: call/expose, multi-turn resume request/response, and push end-to-end. - ai/examples: call / get-card / server + streaming / push / multi-turn / cancel, indexed in the examples README; runnable interop + durable demos under test resources. - ui-next: register CALL_AGENT / GET_AGENT_CARD / CANCEL_AGENT_TASK task types. - design/a2a: protocol study, durability proposal, and server design notes. Note: a CI step to install a2a-sdk (so A2ASdkInteropTest runs in the build job) is left as a follow-up — it needs a token with the `workflow` scope to land.

…fault) The A2A server's optional shared-secret api-key (conductor.a2a.server.api-key) is removed from OSS: the server is now open by default, matching OSS Conductor REST. Inbound authentication (API keys, OAuth/OIDC, mTLS, per-skill scopes, signed Agent Cards) belongs to the enterprise build; front OSS with a gateway/firewall. Unchanged and kept in OSS as safe-by-default guards: client SSRF protection, push-callback token auth, cross-agent execution isolation, and client→remote per-call auth headers. Removes authorized()/apiKey + the two api-key tests; docs and design notes updated.

…-turn + LLM-pick examples Docs: add an error-handling & retries section (FAILED vs FAILED_WITH_TERMINAL_ERROR mapping for HTTP/JSON-RPC/SSRF/liveness) + a troubleshooting table; flesh out the client multi-turn section with a worked SWITCH-on-input-required snippet; add an "orchestrating multiple agents" subsection. Examples (validated by ExampleWorkflowValidationTest): - 27-a2a-multi-agent: FORK_JOIN calling agents in parallel → JOIN. - 28-a2a-llm-pick-skill: GET_AGENT_CARD → LLM_CHAT_COMPLETE → CALL_AGENT. - 29-a2a-client-multi-turn: SWITCH on input-required, re-call with same context/taskId.

…NT + add agentType Generalize the agent task types ahead of multi-runtime support: - CALL_AGENT → AGENT (class CallAgentTask → AgentTask), CANCEL_AGENT_TASK → CANCEL_AGENT. GET_AGENT_CARD kept (discovery/"Agent Card" is A2A-specific). - New input field `agentType` (default "a2a") on all three tasks — the extension point for native runtimes (langgraph, openai, …). Unknown values are rejected with a clear error; only "a2a" is implemented today. Updated across enum, handlers, request models, mapper, callback, tests, test-harness, examples, docs, design notes, and the ui-next task registration. BREAKING CHANGE: workflows using type "CALL_AGENT"/"CANCEL_AGENT_TASK" must switch to "AGENT"/"CANCEL_AGENT".

…sing) Fix 'A AGENT'→'An AGENT' grammar (A2AMetrics, design), reword example descriptions and a doc sentence left awkward by the mechanical rename, and update legacy example workflow names (a2a_call_agent_* → a2a_agent_*).

v1r3n added 3 commits June 19, 2026 18:24

v1r3n mentioned this pull request Jun 20, 2026

feat(a2a): server-side message/stream (SSE) #1196

Open

v1r3n added 4 commits June 19, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Durable A2A (Agent2Agent) protocol — client, server, observability#1195

feat: Durable A2A (Agent2Agent) protocol — client, server, observability#1195
v1r3n wants to merge 7 commits into
mainfrom
feat/durable-a2a-protocol

v1r3n commented Jun 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

v1r3n commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Two directions

Security

Observability

Tests

Docs / UI / examples

⚠️ Follow-up needed (separate, needs workflow token scope)

Out of scope (documented follow-ups)

Verify locally

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

v1r3n commented Jun 20, 2026 •

edited

Loading

⚠️ Follow-up needed (separate, needs `workflow` token scope)