|
| 1 | +# A2A v1 Migration Plan |
| 2 | + |
| 3 | +## Goal |
| 4 | + |
| 5 | +Move kagent from the current `trpc-a2a-go` / A2A 0.3-era protocol shape to official A2A v1 without requiring a maintenance window. |
| 6 | + |
| 7 | +The final target state is: |
| 8 | + |
| 9 | +- Go ADK, Python ADK, UI, and `go/core` speak official A2A v1 on the wire. |
| 10 | +- New `task` and `push_notification` rows are stored as official A2A v1 JSON. |
| 11 | +- Existing legacy rows remain readable during the migration. |
| 12 | +- Users upgrade normally across releases (`0.10.0` → `0.11.0` → migrate on `0.11.0` → `0.12.0`) instead of requiring a maintenance-window storage flip. |
| 13 | + |
| 14 | +## Version Signals |
| 15 | + |
| 16 | +The migration uses two separate version signals: |
| 17 | + |
| 18 | +- **Wire version:** selected by request metadata. |
| 19 | + - Missing `A2A-Version` header means legacy/current kagent A2A behavior. |
| 20 | + - `A2A-Version: 1.0` means official A2A v1 behavior. |
| 21 | + - Unknown versions fail with a clear unsupported-version error. |
| 22 | +- **Storage version:** selected by DB row metadata. |
| 23 | + - `protocol_version IS NULL` or legacy value means stored `trpc-a2a-go` JSON. |
| 24 | + - `protocol_version = "1.0"` means stored official A2A v1 JSON. |
| 25 | + |
| 26 | +This keeps wire compatibility independent from persisted-data migration. |
| 27 | + |
| 28 | +## Required Upgrade Path |
| 29 | + |
| 30 | +The supported zero-downtime path is: |
| 31 | + |
| 32 | +```text |
| 33 | +0.9.x -> 0.10.0 -> 0.11.0 -> (run migrate a2a-v1) -> 0.12.0 |
| 34 | +``` |
| 35 | + |
| 36 | +Direct upgrades from `0.9.x` to `0.11.0` or `0.12.0` should be rejected or documented as unsupported unless the installation first passes through the `0.10.0` bridge release. |
| 37 | + |
| 38 | +Before upgrading to `0.12.0`, every installation that upgraded from a prior kagent release must run the historical storage migration CLI while still on `0.11.0` (see [Optional Historical Migration](#optional-historical-migration)). Fresh installs of `0.11.0` or later that never had legacy rows may skip the CLI. |
| 39 | + |
| 40 | +## Release 0.10.0: Bridge Release |
| 41 | + |
| 42 | +`0.10.0` makes every running controller capable of understanding both legacy and v1 data before any v1 storage writes begin. |
| 43 | + |
| 44 | +High-level behavior: |
| 45 | + |
| 46 | +- Controller can read legacy and v1 `task` / `push_notification` rows. |
| 47 | +- Controller always writes legacy `trpc-a2a-go` storage. |
| 48 | +- Controller uses official A2A SDK compatibility for public A2A wire handling where possible; it should not add custom JSON-RPC decoding for official v0.3 traffic. |
| 49 | +- Controller serves missing-header or v0.3 callers with legacy A2A wire responses. |
| 50 | +- Controller serves `A2A-Version: 1.0` callers with v1 wire responses. |
| 51 | +- Controller continues selecting A2A `0.3` interfaces when proxying to managed agents. |
| 52 | +- First-party UI and managed agent runtimes stay on the existing legacy/v0.3 behavior in this release. |
| 53 | +- Controller can convert in both directions as needed: |
| 54 | + - legacy storage -> v1 wire, |
| 55 | + - legacy storage -> legacy wire, |
| 56 | + - v1 storage -> v1 wire, |
| 57 | + - v1 storage -> legacy wire. |
| 58 | +- AgentCards advertise both legacy and v1 interfaces, preferably with the same URL and different `protocolVersion` values. |
| 59 | +- CORS, proxies, and gRPC metadata preserve `A2A-Version`. |
| 60 | + |
| 61 | +Why this is safe: |
| 62 | + |
| 63 | +- Old controller pods may exist during rollout. |
| 64 | +- New controller pods still write legacy storage. |
| 65 | +- Therefore old controller pods never need to read rows written in v1 format. |
| 66 | + |
| 67 | +## Release 0.11.0: v1 Write Release |
| 68 | + |
| 69 | +`0.11.0` assumes the installation already passed through `0.10.0`, so all controllers that may read new rows are compatibility-capable. Alternatively, you can do a fresh install of `0.11.0` if you do not have kagent running already. |
| 70 | + |
| 71 | +High-level behavior: |
| 72 | + |
| 73 | +- Controller still dual-reads legacy and v1 rows. |
| 74 | +- Controller writes new `task` and `push_notification` rows as official A2A v1 JSON. |
| 75 | +- New v1 rows get `protocol_version = "1.0"`. |
| 76 | +- UI moves to the A2A v1 SDK/types and sends/selects `protocolVersion: 1.0` / `A2A-Version: 1.0`. |
| 77 | +- Managed Go and Python agent runtimes move to v1 interfaces. |
| 78 | +- Controller switches upstream managed-agent client selection from A2A `0.3` interfaces to A2A `1.0` interfaces. |
| 79 | +- Legacy wire compatibility remains available for missing-header callers through this release; it is removed in `0.12.0`. |
| 80 | +- Historical legacy rows do not need to be rewritten to serve traffic on `0.11.0`, but must be migrated via `kagent migrate a2a-v1` before upgrading to `0.12.0`. |
| 81 | + |
| 82 | +Why this is safe: |
| 83 | + |
| 84 | +- Any controller remaining after the `0.11.0` upgrade has the dual-read compatibility introduced in `0.10.0`. |
| 85 | +- New v1 writes are readable by all supported controllers in this upgrade path. |
| 86 | +- Existing legacy rows continue to be converted on read until `0.12.0`. |
| 87 | + |
| 88 | +## Optional Historical Migration |
| 89 | + |
| 90 | +Historical row migration is not required to serve traffic on `0.11.0`, but it **is required** before upgrading to `0.12.0` for any installation that still has legacy `task` or `push_notification` rows. Run it while still on `0.11.0`: |
| 91 | + |
| 92 | +```bash |
| 93 | +kagent migrate a2a-v1 --dry-run |
| 94 | +kagent migrate a2a-v1 |
| 95 | +``` |
| 96 | + |
| 97 | +The command converts legacy `task` and `push_notification` rows to official A2A v1 JSON and sets `protocol_version = "1.0"`. |
| 98 | + |
| 99 | +It should be: |
| 100 | + |
| 101 | +- batch-based, |
| 102 | +- idempotent, |
| 103 | +- restartable, |
| 104 | +- safe against concurrent row changes, |
| 105 | +- explicit about migrated/skipped/failed counts. |
| 106 | + |
| 107 | +The controller keeps dual-read compatibility through `0.11.0` so traffic continues while the CLI runs. `0.12.0` removes that compatibility; do not upgrade until migrated-row count is zero (or the installation never had legacy rows). |
| 108 | + |
| 109 | +## Component Changes |
| 110 | + |
| 111 | +### Controller / Core |
| 112 | + |
| 113 | +- Add nullable `protocol_version` columns for `task` and `push_notification`. |
| 114 | +- Centralize conversion between legacy `trpc-a2a-go` data and official A2A v1 types. |
| 115 | +- Use official A2A SDK compatibility for official v0.3/v1 wire handling where possible. |
| 116 | +- Negotiate wire format from AgentCard interface selection and `A2A-Version`. |
| 117 | +- Select storage parser from `protocol_version`. |
| 118 | +- In `0.10.0`, write legacy storage only. |
| 119 | +- In `0.10.0`, continue selecting managed-agent A2A `0.3` interfaces. |
| 120 | +- In `0.11.0`, write v1 storage by default. |
| 121 | +- In `0.11.0`, switch managed-agent interface selection to A2A `1.0`. |
| 122 | +- Keep dual-read compatibility through `0.11.0`; remove legacy storage parsers and dual-read in `0.12.0`. |
| 123 | +- In `0.12.0`, remove legacy wire handling and `trpc-a2a-go` dependencies from `go/core` (see [Release 0.12.0](#release-0120-cleanup-release)). |
| 124 | + |
| 125 | +### UI |
| 126 | + |
| 127 | +- Stay on legacy/v0.3 behavior in `0.10.0`. |
| 128 | +- Move to the A2A v1 SDK/types in `0.11.0`. |
| 129 | +- Send/select `protocolVersion: 1.0` / `A2A-Version: 1.0` in `0.11.0`. |
| 130 | +- Consume v1 task/message/event shapes in `0.11.0`. |
| 131 | +- Rely on the controller for legacy persisted-data compatibility through `0.11.0` only. |
| 132 | + |
| 133 | +### Go And Python Runtimes |
| 134 | + |
| 135 | +- Stay on legacy/v0.3 behavior in `0.10.0`. |
| 136 | +- Move runtime A2A servers/clients to official A2A v1 in `0.11.0`. |
| 137 | +- In `0.12.0`, drop legacy/v0.3 wire paths; v1 only. |
| 138 | +- Preserve kagent behavior for HITL, ask-user, tool calls, subagent activity, usage metadata, tracing, and session IDs. |
| 139 | +- Avoid runtime-specific compatibility with historical DB formats; that belongs in `go/core`. |
| 140 | + |
| 141 | +## Release 0.12.0: Cleanup Release |
| 142 | + |
| 143 | +`0.12.0` assumes the installation already passed through `0.11.0` and, for any upgrade from a prior release, that `kagent migrate a2a-v1` was run on `0.11.0` so no legacy `task` or `push_notification` rows remain (`protocol_version IS NULL` count is zero). Fresh installs of `0.11.0` or later with no legacy history may upgrade directly. |
| 144 | + |
| 145 | +High-level behavior: |
| 146 | + |
| 147 | +- Controller reads and writes official A2A v1 storage only; legacy `trpc-a2a-go` parsers and dual-read paths are removed. |
| 148 | +- Legacy wire compatibility for missing-header or A2A `0.3` callers is removed (or reduced to an explicit opt-in compatibility flag if product support still requires it). |
| 149 | +- AgentCards and managed-agent client selection use A2A `1.0` interfaces only. |
| 150 | +- `trpc-a2a-go` runtime dependencies are removed from `go/core` where no longer needed for serving. |
| 151 | +- `protocol_version` remains the persisted storage format marker (`"1.0"`). |
| 152 | + |
| 153 | +Why this is safe: |
| 154 | + |
| 155 | +- `0.11.0` introduced v1 writes and dual-read so all controllers in the supported path can read v1 rows. |
| 156 | +- Requiring `kagent migrate a2a-v1` on `0.11.0` ensures historical legacy rows are rewritten before `0.12.0` drops legacy storage support. |
| 157 | +- One release (`0.11.0`) with dual-read gives operators time to run the CLI without a maintenance window. |
| 158 | + |
| 159 | +## Alternatives Considered |
| 160 | + |
| 161 | +1. Deploying v1 agents and UI alongside the new controller in 0.10.0 release |
| 162 | + |
| 163 | +This would reduce some compatibility code and simplify some `go/core` changes, but this would not be strictly zero-downtime since the new UI cannot talk to the old controller. Similarly, if there are multiple controller instances, new instances will start upgrading agents to v1 and it will fail to talk to old controllers. |
| 164 | + |
| 165 | +2. Start writing v1 data in 0.10.0 release |
| 166 | + |
| 167 | +This does not work because if there are multiple controller instances, old instances will crash if there are v1 data in the database. We must wait until all instances have been upgraded to the compatible code, which is in the next release 0.11.0. |
| 168 | + |
| 169 | +3. Simple migration with a maintenance window |
| 170 | + |
| 171 | +Would be simpler (just a data migration script + direct changes to v1 code in agent, UI, controller) but would not be zero-downtime. |
0 commit comments