Skip to content

Add authenticated post-initialize protocol checks to the protocol matrix #157

@ignatov

Description

@ignatov

Problem

The current protocol matrix is designed to be unauthenticated. It is good at answering:

  • Does the agent start?
  • Does initialize succeed?
  • What authMethods and capabilities are advertised?
  • Do post-initialize methods return something reasonable before login?

However, it does not validate protocol behavior that only becomes visible after authentication. This means we can miss important regressions such as:

  • session/new working differently after login
  • session/list / session/resume response shape changes
  • session/set_model no longer updating session state correctly
  • methods returning generic errors instead of protocol-level errors
  • advertised capabilities diverging from real authenticated behavior

Goal

Add a way to run post-auth protocol checks for agents that can be authenticated safely in CI.

This should complement the existing public/unauthenticated matrix, not replace it.

Proposed direction

1. Keep the current unauthenticated matrix

The existing nightly matrix should continue to validate the public, pre-auth contract.

That matrix is still valuable because it verifies:

  • startup behavior
  • initialize
  • advertised capabilities
  • auth boundary behavior (auth_required, method availability, timeouts, process stability)

2. Add a separate authenticated matrix/workflow

Introduce a second workflow for agents that support non-interactive authentication in CI.

Possible examples:

  • env-var token
  • config file seeded from a secret
  • service account / API key
  • device/code flow only if it can be safely automated

Agents that require fully interactive browser login may remain unsupported for authenticated CI checks.

3. Reuse the same probe engine, but add post-auth flow checks

Instead of only checking individual methods, validate small flows after login, for example:

  • initialize -> session/new
  • session/new -> session/list
  • session/new -> session/resume
  • session/new -> session/set_model -> session/resume
  • session/new -> session/stop -> session/resume

This would let us detect both:

  • contract drift (response shape / error semantics)
  • state drift (behavior across a sequence of calls)

4. Record normalized protocol signatures

To avoid snapshot noise, store normalized response signatures rather than full raw payloads.

Examples:

  • result.sessionId: string
  • result.models.currentModelId: string
  • error.code: int
  • error.message: string

This should focus comparisons on protocol structure, not volatile values.

5. Compare authenticated results against previous snapshots

We should explicitly detect regressions such as:

  • required field disappeared
  • field type changed
  • previously supported flow now fails
  • auth_required changed to a generic error
  • capability is still advertised but no longer works after auth

Suggested output structure

Keep public and authenticated results separate, for example:

  • publicProbes
  • authenticatedProbes
  • flowChecks
  • protocolDrift
  • authenticatedCoverage

This would let us distinguish:

  • public compatibility
  • authenticated compatibility
  • unsupported-in-CI auth cases

Open questions

  • Which registered agents can support non-interactive CI authentication today?
  • Do we want authenticated checks to be opt-in per agent?
  • Where should auth metadata live: registry entry, workflow config, or a separate file?
  • Should authenticated regressions fail CI, or only produce warnings at first?
  • Do we want a single combined report, or separate public vs authenticated reports?

Non-goals

At least initially, this does not need to:

  • automate browser-only login flows for every agent
  • store raw full protocol transcripts for all methods
  • replace the current unauthenticated matrix

Why this matters

Today we mostly verify the protocol boundary up to authentication. That is important, but incomplete.

If ACP behavior changes only after login, we currently have little visibility into it. Adding authenticated post-initialize checks would help us catch real compatibility regressions earlier and make the protocol matrix much more useful as an interoperability signal.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions