An agent usually has no view of its own behavior over time. Each turn starts fresh, so a slow drift (more retries, more tool errors, longer paths to the same result) is invisible to it even though the signal is sitting in its own logs. A small self-observation step closes that gap: the agent periodically reads its own recent activity and produces a short report on how it is doing, including flags for things that look like degradation.
The point is that the report is derived from telemetry, not from the model's impression of itself. Counting tool calls, errors, and retries from the actual log is grounded; asking the model whether it thinks it did well is not.
What the notebook would cover:
- A lightweight activity log (tool calls, outcomes, timing) accumulated across a run, and a periodic step that summarizes it into a few honest numbers.
- Two or three signals worth surfacing: rising error or retry rate, a task taking more steps than the same task took earlier, repeated identical actions that produce no new result.
- Turning a signal into an action: when repeated identical outputs are detected, stop rather than loop; when error rate climbs, surface it instead of pushing on. The detector is only useful if something downstream consumes it.
- The honest caveat: this measures process, not correctness. An agent can look healthy on these numbers and still be wrong, so self-observation is an early-warning layer, not a quality guarantee.
It runs on the messages API with a small logging wrapper around the tool loop. I would target the agents section, and it pairs with the guardrail proposal: guardrails stop a known-bad action, self-observation notices a slow drift the rules did not anticipate.
An agent usually has no view of its own behavior over time. Each turn starts fresh, so a slow drift (more retries, more tool errors, longer paths to the same result) is invisible to it even though the signal is sitting in its own logs. A small self-observation step closes that gap: the agent periodically reads its own recent activity and produces a short report on how it is doing, including flags for things that look like degradation.
The point is that the report is derived from telemetry, not from the model's impression of itself. Counting tool calls, errors, and retries from the actual log is grounded; asking the model whether it thinks it did well is not.
What the notebook would cover:
It runs on the messages API with a small logging wrapper around the tool loop. I would target the agents section, and it pairs with the guardrail proposal: guardrails stop a known-bad action, self-observation notices a slow drift the rules did not anticipate.