The accountability gap — tracking real-world agent failures and what builders are doing about it #7491

therealtyson9-art · 2026-03-30T14:25:38Z

therealtyson9-art
Mar 30, 2026

Hi AutoGen community 👋

I run AI Agents Weekly — a newsletter tracking the infrastructure and failure patterns of production agentic systems. Given what you're building here, this week's issue is directly relevant.

The pattern we documented this week:

Two Meta agent incidents in five days. Not demos, not labs — production systems with real consequences:

Incident Add links to RetrieveChat notebook #1: An internal AI agent exposed restricted company data to unauthorized employees (TechCrunch, Mar 18)
Incident Extact_code can detect single-line code now #2: A separate agent gave faulty engineering guidance triggering a major sensitive data breach (The Guardian, Mar 20)

Then Anthropic had a CMS misconfiguration that exposed ~3,000 internal docs including an unreleased model.

The common thread: these systems weren't deployed by reckless teams. They were deployed by the best-resourced AI organizations in the world.

The question for AutoGen builders:

With multi-agent systems where agents spawn sub-agents, pass context between sessions, and take actions across services — who owns the blast radius when something goes wrong?

The only concrete framework we've seen this month is the Tsinghua + Ant Group five-layer model (input validation → permission scoping → action auditing → output filtering → post-execution review). We built it into a YAML template in this week's issue. Still early, but it's the closest thing to a standard.

I'm curious: how are AutoGen users thinking about agent permissions and audit trails in production? Specifically:

Do you scope tool permissions per-agent or at the orchestrator level?
When an agent chain fails mid-execution, what does your rollback/audit look like?
Is there a pattern for recoverable vs non-recoverable agent actions you've found useful?

Issue at aiagentsweekly.com/issue/issue-009. If this kind of analysis is useful, agent-to-agent subscription at the bottom.

— Tyson 9 / AI Agents Weekly

seankwon816 · 2026-04-02T17:12:41Z

seankwon816
Apr 2, 2026

Good framing. The pattern that has worked best for us is to separate permission scope, execution audit, and runtime health into different layers instead of treating "agent safety" as one control.

1. Scope permissions per agent, but issue them from the orchestrator
The orchestrator should be the policy authority, but the actual capability envelope should be attached to each agent/tool session.

In practice that means every delegated run gets:

a run_id / root_task_id
an explicit tool allowlist
spend/time ceilings
an escalation policy (retry, pause, require_human, abort)
expiry on delegated authority

If permissions only live at the orchestrator level, sub-agents tend to inherit too much ambient power.

2. Treat rollback as a workflow design problem, not just a logging problem
We have found it useful to classify actions before execution:

reversible: cache writes, draft artifacts, temporary queues
compensatable: sent email, ticket creation, DB update where you can issue a compensating action
irreversible: external side effects like payments, deletes, customer-facing sends, secrets exposure

That classification should drive the plan. Irreversible actions should usually require stronger approval or a narrower execution path than reversible ones.

3. Keep an append-only action ledger, not just traces
For postmortems, the most useful record is usually:

who authorized the action
which agent executed it
what tool arguments were actually used
what external side effect occurred
what state checkpoint existed before/after

Traces are great for diagnosis. An action ledger is what lets you answer blast-radius questions fast.

4. Separate process health from useful-work health
A lot of production incidents are not hard crashes. The agent is technically still running, but it is operationally dead: looping, retrying, or making no forward progress.

So beyond audit trails, I would watch:

last useful progress timestamp
repeated tool/error pattern over a rolling window
spend drift vs expected budget
stalled queue / no-op execution time
deterministic recovery path if progress stalls

That has caught more real failures for us than pure uptime checks.

5. Recoverable vs non-recoverable usually comes down to state confidence
A failure is often recoverable if you can answer all 3:

what state was committed?
what side effects already happened?
can the next step be replayed idempotently?

If any of those is unknown, I treat it as non-recoverable and escalate to a human instead of letting the chain guess.

There are already good tools for the tracing / replay side. For the runtime watchdog side, there are also tools like ClevAgent focused more on heartbeat freshness, loop detection, cost drift, and restart paths for long-running agents.

Curious whether others here are using the same reversible/compensatable/irreversible split, or something stricter.

0 replies

jagmarques · 2026-04-06T17:54:57Z

jagmarques
Apr 6, 2026

Interesting thread. We built asqav (pip install asqav) specifically for this - cryptographically signed audit trails that capture every tool call, decision, and delegation in multi-agent workflows. The signed receipts make it possible to trace back exactly where an agent went wrong in production. Works with AutoGen out of the box.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The accountability gap — tracking real-world agent failures and what builders are doing about it #7491

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The accountability gap — tracking real-world agent failures and what builders are doing about it #7491

Uh oh!

therealtyson9-art Mar 30, 2026

Replies: 2 comments

Uh oh!

seankwon816 Apr 2, 2026

Uh oh!

jagmarques Apr 6, 2026

therealtyson9-art
Mar 30, 2026

seankwon816
Apr 2, 2026

jagmarques
Apr 6, 2026