[CRITICAL] Deleting all agents leaves messages behind — verify FK cascade runs on existing DBs vs. silent UI refresh failure

## Summary

Deleting **all** agents from the VSIX did **not** remove all messages — message rows/UI entries survived after every agent was gone. Either (A) the DB is **not actually cascade-deleting** messages on the live database, or (B) the **UI is not reflecting** the deletion (stale view / silent refresh failure). Both are plausible given the evidence below. **Investigate and confirm which before fixing.** Per request: **do NOT fix yet** — this issue is to nail the root cause.

---

## Hypothesis A — FK cascade isn't actually running on the live DB

The schema *declares* the right cascades, and a **fresh** DB passes E2E — but an **existing** `data.db` may not, because of how SQLite stores FK actions.

**Schema is correct** — `packages/too-many-cooks/prisma/schema.prisma`:
- `Message.from` (from_agent) and `Message.to` (to_agent) both `onDelete: Cascade` (lines ~47-48)
- `Lock.identity` `onDelete: Cascade` (line ~34), `Plan.identity` `onDelete: Cascade` (line ~73)

**Pragma is set per-connection** — `packages/too-many-cooks/src/db-sqlite.ts:153`: `db.pragma("foreign_keys = ON")`.

**Delete relies purely on cascade** — `db-sqlite.ts:917-945` `adminDeleteAgent` runs a single `DELETE FROM identity WHERE agent_name = ?` and trusts the DB to cascade.

**Fresh-DB E2E passes** — `too_many_cooks_vscode_extension/test/suite/deleteAllAgents.test.ts` proves cascade works on a newly created DB.

### Why an existing DB can still leak messages
The `messages.to_agent` cascade was added **only recently** — migration `packages/too-many-cooks/prisma/migrations/20260525000000_add_to_agent_fk_cascade/migration.sql`. SQLite **bakes FK actions into the table DDL at CREATE time**; `PRAGMA foreign_keys = ON` cannot retroactively add a cascade. That migration is a full table rebuild (`RedefineTables`: create `new_messages` with both FKs → copy → drop → rename). So whether a given `data.db` cascades inbound messages depends **entirely on that migration having actually run** against it.

**Drift risk between two schema-apply paths:**
- Boot path uses **`prisma migrate deploy`** — `db-sqlite.ts:148` (`applyMigrations`), error string "Prisma migrate deploy failed", log "Schema applied via prisma migrate deploy".
- But `packages/too-many-cooks/src/migrate.ts` uses **`prisma db push --accept-data-loss`**.

A DB ever created via `db push` has **no `_prisma_migrations` history**; a later `migrate deploy` can then fail/skip applying `20260525...`, leaving the **old `messages` table without the `to_agent` cascade**. Result: deleting an agent removes the agent but **orphans every message addressed *to* it** — exactly the reported symptom (messages survive agent deletion). CLAUDE.md states there is **no legacy DB migration support** ("delete the stale DB and recreate"), which makes a pre-cascade `data.db` a live hazard rather than a handled case.

### Investigation steps (read-only)
On an affected `data.db`:
```sql
SELECT sql FROM sqlite_master WHERE name = 'messages';   -- does to_agent FK say ON DELETE CASCADE?
PRAGMA foreign_key_list('messages');                     -- both from_agent AND to_agent present with cascade?
SELECT * FROM _prisma_migrations WHERE migration_name LIKE '20260525%';  -- was the cascade migration applied?
PRAGMA foreign_keys;                                     -- is it ON for this connection?
```
If the `messages` DDL lacks `ON DELETE CASCADE` on `to_agent` (or the migration row is missing) → **Hypothesis A confirmed**: the live schema, not the code, is the bug, and the `migrate deploy` vs `db push` drift is the cause.

---

## Hypothesis B — UI not reactive / silent refresh failure

The delete path doesn't mutate messages locally; it refetches server truth — but that refetch can **silently no-op**, leaving stale messages on screen.

**Delete path** — `too_many_cooks_vscode_extension/src/services/storeManager.ts:238-257`:
`deleteAgent` / `deleteAllAgents` POST `/admin/delete-agent` (once per agent), then call `refreshStatus()`.

**`refreshStatus()` swallows failures silently** — `storeManager.ts:196-227`:
- Line 204 & 210: a `refreshSeq` race guard early-`return`s if a newer refresh started — if requests overlap, an in-flight refresh can bail without ever dispatching `SetMessages`.
- Lines 205-208: a **non-ok HTTP response is logged and swallowed** — `return;` with **no error surfaced and no state update**. The UI keeps showing the pre-delete messages and the user sees no indication anything failed.
Only on the happy path does it `dispatch({ messages, type: 'SetMessages' })` (line 225) with server truth.

**Latent reducer landmine** — `too_many_cooks_vscode_extension/src/state/store.ts:13-25`:
the `RemoveAgent` reducer filters `agents`, `locks`, and `plans` for the removed agent **but not `messages`**. It's currently **never dispatched** (dead branch — grep finds no `dispatch({ type: 'RemoveAgent' })`), so it isn't the active cause, but if anyone later wires optimistic single-agent removal to it, it will leave orphaned messages in the store. Should be fixed for consistency.

---

## State-architecture audit (re: "is everything on screen using signals / centralized state?")

- **State IS centralized** in a single immutable store — `src/state/store.ts` (`Store` class, `getState`/`dispatch`/`subscribe`, immutable spread updates). Single source of truth. ✅
- It is **NOT signal-based** — it's a hand-rolled Redux-style `EventEmitter`. Tree views subscribe and re-render on **every** dispatch: e.g. `MessagesTreeProvider` fires `onDidChangeTreeData` on any store change (`src/ui/tree/messagesTreeProvider.ts:27-30`) and re-reads `selectMessages(state)` in `getChildren`. So reactivity wiring **is** present and centralized. ✅
- The gap is **not** scattered/global mutable UI state; it's (1) the silent `refreshStatus` failure path and (2) the incomplete `RemoveAgent` reducer. So if the symptom is UI-side, the root cause is a **silent refresh no-op**, not a missing-signal problem.

---

## Repro plan (do NOT fix yet)
1. Reproduce against an **existing** `data.db` that predates `20260525...` (or one created via `db push`). Send messages between agents A→B and B→A, delete all agents, then query the DB directly (Hypothesis A queries above) **and** observe the VSIX message tree. Compare DB rows vs. UI.
2. If DB still has message rows → **A** (cascade not applied on this DB / migrate-deploy-vs-db-push drift).
3. If DB rows are gone but UI still shows them → **B** (refresh silently failed or didn't fire). Check the extension log for `refreshStatus: response not ok` / a swallowed return.

## Acceptance criteria (for the eventual fix)
- Deleting all agents leaves **zero** message rows in the DB **and zero** messages in the UI, verified on a DB that predates the `to_agent` cascade migration (not just a fresh DB).
- A single source-of-truth for schema application (no `migrate deploy` vs `db push` divergence) OR an explicit guard that detects a pre-cascade `messages` table and rebuilds it.
- `refreshStatus` surfaces failures instead of swallowing them (no silent stale UI).
- `RemoveAgent` reducer also filters `messages` (consistency, even though currently unused).
- Regression tests covering **both** a fresh DB and a simulated pre-cascade DB.

---
**Do not fix in this issue — confirm the root cause first.** Marked critical: deleting agents leaving live message rows is a data-integrity / privacy concern.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CRITICAL] Deleting all agents leaves messages behind — verify FK cascade runs on existing DBs vs. silent UI refresh failure #43

Summary

Hypothesis A — FK cascade isn't actually running on the live DB

Why an existing DB can still leak messages

Investigation steps (read-only)

Hypothesis B — UI not reactive / silent refresh failure

State-architecture audit (re: "is everything on screen using signals / centralized state?")

Repro plan (do NOT fix yet)

Acceptance criteria (for the eventual fix)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[CRITICAL] Deleting all agents leaves messages behind — verify FK cascade runs on existing DBs vs. silent UI refresh failure #43

Description

Summary

Hypothesis A — FK cascade isn't actually running on the live DB

Why an existing DB can still leak messages

Investigation steps (read-only)

Hypothesis B — UI not reactive / silent refresh failure

State-architecture audit (re: "is everything on screen using signals / centralized state?")

Repro plan (do NOT fix yet)

Acceptance criteria (for the eventual fix)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions