-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Problem
The current protocol matrix is designed to be unauthenticated. It is good at answering:
- Does the agent start?
- Does
initializesucceed? - What
authMethodsand capabilities are advertised? - Do post-
initializemethods return something reasonable before login?
However, it does not validate protocol behavior that only becomes visible after authentication. This means we can miss important regressions such as:
session/newworking differently after loginsession/list/session/resumeresponse shape changessession/set_modelno longer updating session state correctly- methods returning generic errors instead of protocol-level errors
- advertised capabilities diverging from real authenticated behavior
Goal
Add a way to run post-auth protocol checks for agents that can be authenticated safely in CI.
This should complement the existing public/unauthenticated matrix, not replace it.
Proposed direction
1. Keep the current unauthenticated matrix
The existing nightly matrix should continue to validate the public, pre-auth contract.
That matrix is still valuable because it verifies:
- startup behavior
initialize- advertised capabilities
- auth boundary behavior (
auth_required, method availability, timeouts, process stability)
2. Add a separate authenticated matrix/workflow
Introduce a second workflow for agents that support non-interactive authentication in CI.
Possible examples:
- env-var token
- config file seeded from a secret
- service account / API key
- device/code flow only if it can be safely automated
Agents that require fully interactive browser login may remain unsupported for authenticated CI checks.
3. Reuse the same probe engine, but add post-auth flow checks
Instead of only checking individual methods, validate small flows after login, for example:
initialize -> session/newsession/new -> session/listsession/new -> session/resumesession/new -> session/set_model -> session/resumesession/new -> session/stop -> session/resume
This would let us detect both:
- contract drift (response shape / error semantics)
- state drift (behavior across a sequence of calls)
4. Record normalized protocol signatures
To avoid snapshot noise, store normalized response signatures rather than full raw payloads.
Examples:
result.sessionId: stringresult.models.currentModelId: stringerror.code: interror.message: string
This should focus comparisons on protocol structure, not volatile values.
5. Compare authenticated results against previous snapshots
We should explicitly detect regressions such as:
- required field disappeared
- field type changed
- previously supported flow now fails
auth_requiredchanged to a generic error- capability is still advertised but no longer works after auth
Suggested output structure
Keep public and authenticated results separate, for example:
publicProbesauthenticatedProbesflowChecksprotocolDriftauthenticatedCoverage
This would let us distinguish:
- public compatibility
- authenticated compatibility
- unsupported-in-CI auth cases
Open questions
- Which registered agents can support non-interactive CI authentication today?
- Do we want authenticated checks to be opt-in per agent?
- Where should auth metadata live: registry entry, workflow config, or a separate file?
- Should authenticated regressions fail CI, or only produce warnings at first?
- Do we want a single combined report, or separate public vs authenticated reports?
Non-goals
At least initially, this does not need to:
- automate browser-only login flows for every agent
- store raw full protocol transcripts for all methods
- replace the current unauthenticated matrix
Why this matters
Today we mostly verify the protocol boundary up to authentication. That is important, but incomplete.
If ACP behavior changes only after login, we currently have little visibility into it. Adding authenticated post-initialize checks would help us catch real compatibility regressions earlier and make the protocol matrix much more useful as an interoperability signal.