Skip to content

[PROPOSAL] Cookbook: Reconciling an agent's outputs across contexts before they ship #720

Description

@WGlynn

An agent that works across several separate contexts, sessions, or sub-agents can end up contradicting itself: a decision made in one place is unknown to another, and the two outputs disagree. The contradiction is usually invisible at the moment it is produced, because each context looks internally consistent.

The pattern that helps is a reconciliation pass before a result is treated as final. Instead of trusting the current context alone, the agent enumerates the other contexts that could invalidate the claim, checks the claim against each, and only then delivers, with the conflicts it found resolved or flagged. It is cheap because it runs once at the end, not continuously.

What the notebook would cover:

  • A setup that reliably produces the failure: the same question answered in two contexts that each hold half the relevant facts, so a naive answer is confidently wrong.
  • The reconciliation step: list the contexts that bear on the claim, cross-check each, and produce a single answer that accounts for all of them. A measured before-and-after on a batch where the contradiction rate is the score.
  • Where it pays off and where it does not: multi-session and multi-agent work benefit most; a single short context rarely needs it, and the cookbook should show that so the pass is applied where it earns its cost.
  • The failure mode to watch: a reconciliation that just picks the most recent context instead of genuinely cross-checking. How to tell the difference in the output.

It runs on the messages API and pairs naturally with the handoff proposal, since the saved state from one session is exactly the other context a later session must reconcile against. I would target the agents section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions