Add authenticated post-`initialize` protocol checks to the protocol matrix

## Problem

The current protocol matrix is designed to be unauthenticated. It is good at answering:

- Does the agent start?
- Does `initialize` succeed?
- What `authMethods` and capabilities are advertised?
- Do post-`initialize` methods return something reasonable before login?

However, it does **not** validate protocol behavior that only becomes visible after authentication. This means we can miss important regressions such as:

- `session/new` working differently after login
- `session/list` / `session/resume` response shape changes
- `session/set_model` no longer updating session state correctly
- methods returning generic errors instead of protocol-level errors
- advertised capabilities diverging from real authenticated behavior

## Goal

Add a way to run **post-auth protocol checks** for agents that can be authenticated safely in CI.

This should complement the existing public/unauthenticated matrix, not replace it.

## Proposed direction

### 1. Keep the current unauthenticated matrix

The existing nightly matrix should continue to validate the public, pre-auth contract.

That matrix is still valuable because it verifies:

- startup behavior
- `initialize`
- advertised capabilities
- auth boundary behavior (`auth_required`, method availability, timeouts, process stability)

### 2. Add a separate authenticated matrix/workflow

Introduce a second workflow for agents that support non-interactive authentication in CI.

Possible examples:

- env-var token
- config file seeded from a secret
- service account / API key
- device/code flow only if it can be safely automated

Agents that require fully interactive browser login may remain unsupported for authenticated CI checks.

### 3. Reuse the same probe engine, but add post-auth flow checks

Instead of only checking individual methods, validate small flows after login, for example:

- `initialize -> session/new`
- `session/new -> session/list`
- `session/new -> session/resume`
- `session/new -> session/set_model -> session/resume`
- `session/new -> session/stop -> session/resume`

This would let us detect both:

- contract drift (response shape / error semantics)
- state drift (behavior across a sequence of calls)

### 4. Record normalized protocol signatures

To avoid snapshot noise, store normalized response signatures rather than full raw payloads.

Examples:

- `result.sessionId: string`
- `result.models.currentModelId: string`
- `error.code: int`
- `error.message: string`

This should focus comparisons on protocol structure, not volatile values.

### 5. Compare authenticated results against previous snapshots

We should explicitly detect regressions such as:

- required field disappeared
- field type changed
- previously supported flow now fails
- `auth_required` changed to a generic error
- capability is still advertised but no longer works after auth

## Suggested output structure

Keep public and authenticated results separate, for example:

- `publicProbes`
- `authenticatedProbes`
- `flowChecks`
- `protocolDrift`
- `authenticatedCoverage`

This would let us distinguish:

- public compatibility
- authenticated compatibility
- unsupported-in-CI auth cases

## Open questions

- Which registered agents can support non-interactive CI authentication today?
- Do we want authenticated checks to be opt-in per agent?
- Where should auth metadata live: registry entry, workflow config, or a separate file?
- Should authenticated regressions fail CI, or only produce warnings at first?
- Do we want a single combined report, or separate public vs authenticated reports?

## Non-goals

At least initially, this does **not** need to:

- automate browser-only login flows for every agent
- store raw full protocol transcripts for all methods
- replace the current unauthenticated matrix

## Why this matters

Today we mostly verify the protocol boundary up to authentication. That is important, but incomplete.

If ACP behavior changes only after login, we currently have little visibility into it. Adding authenticated post-`initialize` checks would help us catch real compatibility regressions earlier and make the protocol matrix much more useful as an interoperability signal.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add authenticated post-`initialize` protocol checks to the protocol matrix #157

Problem

Goal

Proposed direction

1. Keep the current unauthenticated matrix

2. Add a separate authenticated matrix/workflow

3. Reuse the same probe engine, but add post-auth flow checks

4. Record normalized protocol signatures

5. Compare authenticated results against previous snapshots

Suggested output structure

Open questions

Non-goals

Why this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add authenticated post-initialize protocol checks to the protocol matrix #157

Description

Problem

Goal

Proposed direction

1. Keep the current unauthenticated matrix

2. Add a separate authenticated matrix/workflow

3. Reuse the same probe engine, but add post-auth flow checks

4. Record normalized protocol signatures

5. Compare authenticated results against previous snapshots

Suggested output structure

Open questions

Non-goals

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add authenticated post-`initialize` protocol checks to the protocol matrix #157