Skip to content

Commit a5adff5

Browse files
committed
feat(shared,extension): add next-step review feedback
Persist local review feedback, suppress dismissed or snoozed review items across dashboard and Chickens, and record the corresponding feature-pack/design/config updates.
1 parent fee338b commit a5adff5

43 files changed

Lines changed: 1779 additions & 77 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.codex/config.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# See also: AGENTS.md (shared instructions), .claude/ (Claude Code config)
33

44
# Model defaults
5-
model = "gpt-5.4"
5+
model = "gpt-5.5"
66

77
# Sandbox: keep work inside the repo by default; approval covers anything broader.
88
sandbox_mode = "workspace-write"
@@ -25,7 +25,7 @@ multi_agent = true
2525
model = "gpt-5.4"
2626
model_reasoning_effort = "high"
2727
sandbox_mode = "read-only"
28-
approval_policy = "never"
28+
approval_policy = "on-request"
2929

3030
[profiles.quick]
3131
model = "gpt-4.1-mini"
@@ -35,13 +35,13 @@ model_reasoning_effort = "low"
3535
model = "gpt-5.4"
3636
model_reasoning_effort = "high"
3737
sandbox_mode = "read-only"
38-
approval_policy = "never"
38+
approval_policy = "on-request"
3939

4040
[profiles.triage]
4141
model = "gpt-4.1-mini"
4242
model_reasoning_effort = "low"
4343
sandbox_mode = "read-only"
44-
approval_policy = "never"
44+
approval_policy = "on-request"
4545

4646
[profiles.migrate]
4747
model = "gpt-5.4"

.plans/features/agent-knowledge-sandbox/eval/qa-report.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
- Retrieval and provenance surfaces need an honest regression pass before this pack can move beyond
1313
blocked QA.
1414
- Earlier plan notes described a more ambitious graph backend than the one currently implemented.
15+
- QA must also verify the newer memory-provenance contract: observed, inferred, user-confirmed,
16+
imported, and stale memories should not collapse into the same retrieval or UI behavior.
1517

1618
## QA Pass 1: Codex
1719

.plans/features/agent-knowledge-sandbox/qa/qa-codex.todo.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ skills:
1717
qa_order: 1
1818
handoff_in: handoff/qa-codex/agent-knowledge-sandbox
1919
handoff_out: handoff/qa-claude/agent-knowledge-sandbox
20-
updated: 2026-04-19
20+
updated: 2026-05-07
2121
---
2222

2323
# QA Pass 1 — Codex
@@ -75,10 +75,14 @@ Triggered by: UI + State lanes complete
7575
- [ ] Positive precedents boost confidence >= 0.05
7676
- [ ] Negative precedents decrease confidence >= 0.05
7777
- [ ] Trace quota enforcement (max 500) works with oldest-first pruning
78+
- [ ] Memory write-back records provenance label, confirmation status, source channel, provider/model use, trace/task ID, confidence, and unresolved questions where applicable
79+
- [ ] Inferred but unconfirmed memory is retrievable as context but is not treated as instruction-like guidance
80+
- [ ] Stale memory is visibly labeled and ranked below current observed or user-confirmed context
7881

7982
## Integration
8083

8184
- [ ] All existing unit tests pass (zero regressions)
8285
- [ ] All existing eval cases pass at current thresholds
8386
- [ ] Agent cycle time within 120% of baseline
8487
- [ ] Flat agentMemories table still works (backwards compat)
88+
- [ ] Recommendation UI can answer "where did this come from?" without exposing raw prompts, model output, or internal trace payloads in simple mode

.plans/features/agent-knowledge-sandbox/spec.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
1919
- The agent has no way to learn from YouTube channels, GitHub repos, or RSS feeds that the coop follows
2020
- Reasoning traces evaporate after skill execution — no institutional memory, no precedent system
2121
- The Neo4j Context Graph talk validated that graph memory + reasoning traces + hybrid retrieval is production-ready and the patterns map directly to Coop's architecture
22+
- The durable-agent runtime lesson is clear: if models can change, memory needs independent
23+
provenance, confirmation status, and retrieval rules so Coop behaves continuously without
24+
turning model output into unaccountable truth.
2225

2326
## Scope
2427

@@ -57,7 +60,7 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
5760
- Context assembly for skill prompts (token-budgeted)
5861
- No LLM calls during retrieval (hard requirement)
5962

60-
**Phase 6 — Reasoning Traces + Compound Loop** (learning)
63+
**Phase 6 — Reasoning Traces + Memory Provenance** (learning)
6164
- Record decision traces as precedent nodes linked to skill runs
6265
- Precedent query by observation similarity
6366
- Confidence adjustment based on past decision outcomes
@@ -66,6 +69,14 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
6669
- Rejection weakens: decrease edge confidence, temporal invalidation (not deletion)
6770
- Validated insight entities: approved draft summaries become first-class graph nodes
6871
- Append-only activity log: chronological record of ingests, queries, lint passes
72+
- Memory write-back labels each durable entry as `observed`, `inferred`, `user-confirmed`,
73+
`imported`, or `stale`
74+
- Confirmation status controls retrieval weight and whether a memory can be used as instruction-like
75+
context later
76+
- Retrieval-before-work gathers relevant sources, decisions, prior failures, open questions, and
77+
constraints before meaningful agent work starts
78+
- Write-back-after-work records output summary, source channel, provider/model use, trace or task ID,
79+
confidence, unresolved questions, and confirmation status
6980

7081
**Phase 7 — Lint + Integration** (wiring + health)
7182
- Knowledge lint skill: orphan entities, stale sources, contradictions, coverage gaps, graph health
@@ -91,12 +102,14 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
91102
- Add YouTube channels, GitHub repos, RSS feeds, subreddits, NPM packages as knowledge sources
92103
- See what the agent knows (topic bars) and why it recommended something (sourced from + track record)
93104
- Review agent decisions with full provenance (which sources, which precedents)
105+
- Distinguish user-confirmed memory from inferred or imported memory without opening an agent log
94106
- See source health at a glance (popup dot, Nest freshness indicators)
95107

96108
**Operators can:**
97109
- Configure source allowlists per coop
98110
- Monitor graph size and entity counts
99111
- Review agent decision history with reasoning traces
112+
- Audit which memories are model-inferred, source-observed, imported, stale, or confirmed by a member
100113
- See cascade effects before removing sources
101114

102115
**What stays the same:**
@@ -118,6 +131,10 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
118131
- Entity extraction must use existing inference cascade — no new model infrastructure
119132
- No LLM calls during graph retrieval (performance requirement)
120133
- Source adapters must go through `assertAllowedSource()` — no direct fetch from unapproved URLs
134+
- Provenance labels extend the existing memory/trace contracts; do not introduce a parallel
135+
repo-level memory truth surface.
136+
- Raw source fetch state stays local. Only approved outputs and user-confirmed memory projections may
137+
become shared coop memory.
121138

122139
### New dependencies
123140
- `@kuzu/kuzu-wasm` — embedded graph DB (or Vela-Engineering fork for concurrent writes)
@@ -192,18 +209,25 @@ The "YouTube Kids for agents" model: sandbox the knowledge, not just the executi
192209
- [ ] Precedent query finds similar past decisions
193210
- [ ] Positive precedents boost confidence >= 0.05
194211
- [ ] Negative precedents decrease confidence >= 0.05
212+
- [ ] Memory write-back records provenance label, confirmation status, source channel, provider or
213+
model use, trace/task ID, confidence, and unresolved questions where applicable
214+
- [ ] Unconfirmed inferred memory is retrievable as context but never elevated to instruction-like
215+
guidance without member confirmation
216+
- [ ] Stale memory is visible as stale and ranked below current observed or confirmed context
195217

196218
### Phase 7 — Integration
197219
- [ ] All existing eval cases pass at or above current thresholds (zero regression)
198220
- [ ] Agent cycle time stays within 120% of baseline
199221
- [ ] Graph-enhanced skills show >= 10% quality improvement (A/B evaluation)
200222
- [ ] UI surfaces work: Nest Sources, Roost Knowledge, DraftCard provenance, Popup pulse
223+
- [ ] Recommendation surfaces can answer "where did this come from?" without exposing raw prompts,
224+
model output, or internal trace payloads in simple mode
201225

202226
## Validation Plan
203227

204228
- **Unit**: Source registry CRUD, adapter parsing, entity extraction, graph CRUD, retrieval relevance, temporal correctness, reasoning traces
205-
- **Integration**: Full pipeline: source → adapter → extraction → graph → retrieval → skill context → output
206-
- **E2E**: Member adds source → agent ingests → agent uses in recommendation → member sees provenance
229+
- **Integration**: Full pipeline: source → adapter → extraction → graph → retrieval → skill context → output → provenance-labeled write-back
230+
- **E2E**: Member adds source → agent ingests → agent uses in recommendation → member sees provenance and can distinguish inferred from confirmed memory
207231
- **A/B**: Baseline (flat memory) vs graph-enhanced (graph retrieval) quality comparison on eval corpus
208232
- **Regression**: All existing skill eval cases + unit tests must pass at pre-implementation thresholds
209233

.plans/features/agent-knowledge-sandbox/status.json

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,9 @@
6060
"No LLM during retrieval (hard perf requirement)",
6161
"Vellum material language for all UI surfaces",
6262
"7-phase dependency-ordered build with gates",
63-
"Entity extraction as new skill using existing cascade"
63+
"Entity extraction as new skill using existing cascade",
64+
"Memory provenance and confirmation labels are part of the product contract: observed, inferred, user-confirmed, imported, and stale memories must behave differently in retrieval and UI."
6465
],
65-
"updated_at": "2026-04-19",
66-
"notes": "UI + state lanes materially landed; graph backend still snapshot-persisted (Kuzu-WASM deferred). QA pass 1 is now ready to run."
66+
"updated_at": "2026-05-07",
67+
"notes": "UI + state lanes materially landed; graph backend still snapshot-persisted (Kuzu-WASM deferred). QA pass 1 is now ready to run, including provenance, confirmation, and retrieval-before-work checks."
6768
}

.plans/features/next-gen-model-readiness/context.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,24 @@ Analysis based on Nate B Jones transcript re: Claude Mythos and four parallel au
88
3. Validation pipeline redundancy analysis
99
4. Multi-agent coordination overhead assessment
1010

11+
Additional source: "Dive into Claude Code: The Design Space of Today's and Future AI Agent
12+
Systems" (arXiv:2604.14228v1). The relevant takeaway is not to copy Claude Code wholesale; it is
13+
to preserve deterministic harness boundaries while simplifying model-facing scaffolding.
14+
15+
## Harness Guardrail Classification
16+
17+
Use this table during Phase 1 cleanup before removing or rewriting prompt/context material.
18+
19+
| Classification | Examples in this repo | Where it must live |
20+
|---|---|---|
21+
| `deterministic-gate` | permission checks, deny/allow rules, hook enforcement, schema validation, `bun run test` enforcement, release gates | Code, hook config, schemas, validators, tests, or `scripts/validate.ts` |
22+
| `repo-constraint` | barrel imports, root `.env.local`, no Dexie access from views, MV3 service-worker constraints | One canonical repo instruction or rule file |
23+
| `product-intent` | local-first, passkey-first, explicit publish, community/project framing, friendly non-console UX | `CLAUDE.md`, product context, or current plan spec |
24+
| `soft-guidance` | library tutorials, static file maps, generic debugging recipes, boilerplate test snippets | Remove or replace with file pointers |
25+
26+
If a cleanup step touches a `deterministic-gate`, the implementation note must name the executable
27+
surface that still enforces it after the prose is removed.
28+
1129
## Key Files by Phase
1230

1331
### Phase 1: Prompt Surface

.plans/features/next-gen-model-readiness/lanes/api.claude.todo.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,17 @@ done_when:
2121
skills:
2222
- architecture
2323
- testing
24-
updated: 2026-04-02
24+
updated: 2026-05-07
2525
---
2626

2727
# Phase 3: Prepare the Agent Pipeline for Model Upgrade
2828

2929
Target: Introduce a "capable model" code path alongside the 0.5B legacy path. Define tools from deterministic skills. Collapse output handlers into generic tools. Feature-flagged via `VITE_COOP_AGENT_MODE`.
3030

31-
**Principle**: The legacy path (0.5B + heuristic fallbacks) continues working unchanged. The autonomous path is additive — same observation lifecycle, same approval gates, same memory system, different execution strategy.
31+
**Principle**: The legacy path (0.5B + heuristic fallbacks) continues working unchanged. The
32+
autonomous path is additive — same observation lifecycle, same approval gates, same memory system,
33+
same trace evidence, same fallback semantics, different execution strategy. Model routing remains
34+
internal runtime evidence; simple mode should not become a provider-management surface.
3235

3336
## Step 1: Add `VITE_COOP_AGENT_MODE` environment variable
3437

@@ -162,6 +165,8 @@ export async function runAutonomousAgentCycle(options: {
162165
- The plan goes through the same approval gate as the legacy path
163166
- Memory is queried the same way (via `queryMemoriesForSkill`)
164167
- Observations are created/updated the same way
168+
- Provider/model details are persisted for traces, benchmarks, and advanced diagnostics, but user
169+
workflow state stays provider-independent
165170

166171
**Model bridge interface** (abstract — concrete implementation depends on model provider):
167172
```typescript

.plans/features/next-gen-model-readiness/lanes/docs.claude.todo.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,25 @@ done_when:
2222
- rules/tests.md lines < 100
2323
skills:
2424
- architecture
25-
updated: 2026-04-02
25+
updated: 2026-05-07
2626
---
2727

2828
# Phase 1: Simplify the Prompt Surface
2929

3030
Target: ~8,500 lines → ~3,000 lines. Remove library docs, static code maps, procedural recipes, duplicated constraints. Keep outcomes, constraints, anti-patterns, product intent.
3131

32+
## Required guardrail
33+
34+
Before deleting or shortening any instruction, classify it using `../context.md`:
35+
36+
- `deterministic-gate`: keep or move to code, hooks, schemas, validators, tests, or validation scripts.
37+
- `repo-constraint`: keep once in the canonical repo instruction/rule surface.
38+
- `product-intent`: preserve in product context or another short pointer.
39+
- `soft-guidance`: remove or replace with a source pointer.
40+
41+
Do not convert a deterministic gate into prompt-only guidance. If the old prose was the only place a
42+
gate existed, stop and route it to an executable surface or record the gap.
43+
3244
## Step 1: Replace context code maps with pointer files
3345

3446
Replace `.claude/context/app.md` (163 lines), `extension.md` (466 lines), `shared.md` (448 lines) with ~15-line pointer files. Each file becomes:
@@ -53,6 +65,9 @@ Read the source files above for architecture details.
5365

5466
**Verify**: Each pointer file < 20 lines. No constraint lost (cross-reference against rules/ files).
5567

68+
**Guardrail audit**: Record any removed `deterministic-gate`, `repo-constraint`, or `product-intent`
69+
in the implementation notes with its new canonical home.
70+
5671
## Step 2: Reduce skills to constraint cards
5772

5873
For each `.claude/skills/*/SKILL.md`, reduce to 30-50 lines:
@@ -93,6 +108,9 @@ For each `.claude/skills/*/SKILL.md`, reduce to 30-50 lines:
93108

94109
**Verify**: `wc -l .claude/skills/*/SKILL.md` shows each file 20-50 lines.
95110

111+
**Guardrail audit**: For each skill, keep Coop-specific constraints and anti-patterns; remove generic
112+
methodology only after confirming enforcement-sensitive behavior lives in a rule, hook, schema, or test.
113+
96114
## Step 3: Delete meta-documentation
97115

98116
- [ ] Delete `.claude/skills/index.md` (238 lines) — meta-documentation about the prompt system itself. The model discovers skills via registry, not an index file.
@@ -139,6 +157,9 @@ Remove ~120 lines from CLAUDE.md:
139157

140158
**Verify**: `wc -l CLAUDE.md` shows < 200 lines. `bun run validate quick` passes.
141159

160+
**Guardrail audit**: Any removed release, permission, env, or command rule must still be enforced by a
161+
hook, rule file, validator, test, or script.
162+
142163
## Step 6: Deduplicate constraints
143164

144165
For each constraint in the deduplication map (see context.md), ensure it appears in exactly one canonical location:
@@ -166,4 +187,5 @@ Review each `.claude/agents/*.md` for duplicated Coop rules:
166187
- [ ] `bun run validate quick` passes
167188
- [ ] Total `.claude/` line count < 3,500 (down from ~8,500)
168189
- [ ] No constraint was removed without being present elsewhere (audit trail in commit message)
190+
- [ ] Every removed `deterministic-gate`, `repo-constraint`, and `product-intent` has an implementation-note entry naming its durable home
169191
- [ ] `product.md` unchanged

0 commit comments

Comments
 (0)