Skip to content

Commit 359acac

Browse files
committed
chore(shared,client): align workspace with latest coop updates
1 parent 228ed37 commit 359acac

420 files changed

Lines changed: 35155 additions & 17226 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/context/product.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,36 @@ Someone using the paired mobile web app:
6464
- Captures sync to the extension via WebSocket relay
6565
- Captured content enters the private intake queue before review
6666

67+
## Persona & Tone Quick-Reference
68+
69+
Each extension surface and app page has a primary audience. Match tone and vocabulary accordingly.
70+
71+
| Archetype | Persona | Perceives Coop As | Tone | Avoid |
72+
|-----------|---------|-------------------|------|-------|
73+
| Regular Member | Sam | "A tab organizer that helps my group" | Warm, plain-language, action-oriented | Jargon, options overload, blockchain vocabulary |
74+
| Trusted Member | Nia | "Our community's shared memory" | Encouraging, collaborative, direct | Condescension, assumed technical literacy |
75+
| Operator | Kai | "My community's command center" | Professional, efficient, task-focused | Oversimplification, hiding complexity |
76+
| Mobile Contributor | Luz | "Quick capture from my phone" | Minimal, fast, zero-friction | Long explanations, multi-step flows |
77+
78+
### Surface → Persona Mapping
79+
80+
| Surface | Primary Persona | Tone Notes |
81+
|---------|----------------|------------|
82+
| **Popup** | Sam (Regular) | Quick, casual, action-first. 1-2 words per button. No explanation needed. |
83+
| **Chickens tab** | Sam / Nia | Minimal, calm, Notion-like. 3-4 info pieces per card. Inline actions. |
84+
| **Coops tab** | Nia (Trusted) | Collaborative, evidence-based. Show sync state, archive receipts. |
85+
| **Roost tab** | Nia / Kai | Workspace-focused. Show progress, garden state, submissions. |
86+
| **Nest tab** | Kai (Operator) | Admin-precise. Member list, invite codes, receiver pairing, settings. |
87+
| **Landing page** | Sam (first visit) | Warm, atmospheric. Explain value in one scroll. Zero blockchain jargon. |
88+
| **Receiver PWA** | Luz (Mobile) | Ultra-minimal. Capture button dominant. Status indicators only. |
89+
90+
### Vocabulary Constraints
91+
92+
- **Regular Member / Mobile**: Zero blockchain vocabulary. 6th-grade reading level. Say "save" not "publish to chain", "group" not "multisig".
93+
- **Trusted Member**: May reference "shared feed", "archive", "sync". Never Solidity internals.
94+
- **Operator**: May reference "Safe address", "invite codes", "trusted member". Never raw transaction details.
95+
- All surfaces: Prefer chicken metaphors where they clarify (Loose Chickens, Coop Feed), drop them where they obscure.
96+
6797
## Brand Direction
6898

6999
### Chicken Metaphors

.claude/evals/README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Agent Evaluation Framework
2+
3+
Structured test cases for verifying agent quality. Run evals after modifying agent definitions or skills.
4+
5+
## Structure
6+
7+
```
8+
.claude/evals/
9+
├── README.md # This file
10+
├── results.md # Historical results table
11+
├── triage/ # Triage agent evals
12+
├── code-reviewer/ # Code reviewer evals
13+
└── oracle/ # Oracle agent evals
14+
```
15+
16+
## Running Evals
17+
18+
Evals are run manually by spawning the agent with a test scenario and comparing output against expected results.
19+
20+
1. Pick a test case from the agent's eval directory
21+
2. Spawn the agent with the scenario prompt
22+
3. Compare output against the expected output and scoring rubric
23+
4. Record results in `results.md`
24+
25+
## Scoring
26+
27+
- **triage**: Classification accuracy (correct severity, type, package routing)
28+
- **code-reviewer**: Finding precision (true positives vs false positives, correct severity levels)
29+
- **oracle**: Research quality (sources cited, synthesis depth, actionability of conclusions)
30+
31+
## Maturity
32+
33+
| Agent | Test Cases | Status |
34+
|-------|-----------|--------|
35+
| triage | 3 | ready |
36+
| code-reviewer | 2 | ready |
37+
| oracle | 2 | ready |
38+
39+
## When to Run
40+
41+
- After modifying `.claude/agents/*.md`
42+
- After modifying `.claude/skills/*/SKILL.md` for skills referenced by agents
43+
- Before upgrading model versions (compare scores across models)
44+
- Quarterly regression check
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Code Reviewer Eval: Clean Refactor
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
Review a PR that extracts a utility function from a component into shared/src/utils/. No behavior change. Tests pass. The extraction follows barrel import conventions.
10+
11+
```typescript
12+
// Before: inline in extension/src/views/Popup/TabList.tsx
13+
function formatTabAge(createdAt: number): string {
14+
const diff = Date.now() - createdAt;
15+
if (diff < 60000) return 'just now';
16+
if (diff < 3600000) return `${Math.floor(diff / 60000)}m ago`;
17+
return `${Math.floor(diff / 3600000)}h ago`;
18+
}
19+
20+
// After: extracted to shared/src/utils/format-time.ts
21+
export function formatTabAge(createdAt: number): string { /* same logic */ }
22+
23+
// shared/src/index.ts updated with re-export
24+
export { formatTabAge } from './utils/format-time';
25+
26+
// TabList.tsx now imports from @coop/shared
27+
import { formatTabAge } from '@coop/shared';
28+
```
29+
30+
## Expected Output
31+
32+
- **must_fix**: 0
33+
- **should_fix**: 0-1 (optional: could suggest `formatRelativeTime` as a more general name)
34+
- **recommendation**: APPROVE
35+
36+
## Eval Criteria
37+
38+
| Criterion | Weight | Pass | Fail |
39+
|-----------|--------|------|------|
40+
| Approves clean refactor | 35% | APPROVE recommendation | REQUEST_CHANGES on a clean PR |
41+
| No false must_fix | 30% | Zero must_fix findings | Invents issues that don't exist |
42+
| Recognizes barrel import correctness | 20% | Notes the import follows conventions | Flags import pattern as issue |
43+
| Optional naming suggestion only | 15% | At most a should_fix naming suggestion | Demands changes for style |
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Code Reviewer Eval: Missing Error Handling
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
Review a PR that adds a new async function fetching coop data without try/catch, no error boundary wrapping the consuming component, and raw error objects shown to the user via `toast.error(error.message)`.
10+
11+
```typescript
12+
// New function in shared/src/modules/coop/index.ts
13+
export async function fetchCoopMembers(coopId: string) {
14+
const response = await fetch(`/api/coops/${coopId}/members`);
15+
const data = await response.json();
16+
return data.members;
17+
}
18+
19+
// Usage in extension popup
20+
function MemberList({ coopId }: { coopId: string }) {
21+
const [members, setMembers] = useState([]);
22+
useEffect(() => {
23+
fetchCoopMembers(coopId).then(setMembers);
24+
}, [coopId]);
25+
return members.map(m => <MemberCard key={m.id} member={m} />);
26+
}
27+
```
28+
29+
## Expected Output
30+
31+
- **must_fix**: 1 — Unhandled promise rejection (no .catch() or try/catch on the fetch call)
32+
- **should_fix**: 1 — No user-friendly error message (should use categorizeError + toast)
33+
- **recommendation**: REQUEST_CHANGES
34+
35+
## Eval Criteria
36+
37+
| Criterion | Weight | Pass | Fail |
38+
|-----------|--------|------|------|
39+
| Catches unhandled promise | 30% | Flags the missing .catch() as must_fix | Misses it or downgrades to should_fix |
40+
| Flags raw error exposure | 25% | Notes missing categorizeError/user-friendly message | Ignores error UX |
41+
| Correct severity levels | 20% | must_fix for crash, should_fix for UX | Wrong severity assignments |
42+
| No false positives | 15% | Doesn't flag correct patterns as issues | Flags style/naming as must_fix |
43+
| Recommendation = REQUEST_CHANGES | 10% | Correctly recommends changes | Approves despite must_fix |
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Oracle Eval: Architecture Question
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
> "Why does Coop use y-webrtc instead of y-websocket as the primary sync transport?"
10+
11+
## Expected Output
12+
13+
Must address:
14+
1. **Privacy/local-first rationale**: y-webrtc sends data directly peer-to-peer, so the server never sees document content — aligning with Coop's local-first principle
15+
2. **y-websocket as fallback**: Coop does use y-websocket (via the API server at `/yws`) as a fallback for document sync when WebRTC fails
16+
3. **Signaling vs transport distinction**: The API server provides signaling (peer discovery) but doesn't relay document data in the primary path
17+
18+
Must cite at least 2 sources from the codebase (e.g., CLAUDE.md, ADR-007, shared/storage module, API server ws handler).
19+
20+
## Eval Criteria
21+
22+
| Criterion | Weight | Pass | Fail |
23+
|-----------|--------|------|------|
24+
| Explains privacy rationale | 30% | Connects y-webrtc to local-first/privacy | Generic "it's faster" answer |
25+
| Mentions y-websocket fallback | 25% | Identifies y-websocket as fallback, not absent | Says Coop doesn't use y-websocket |
26+
| Cites 2+ sources | 20% | References specific files or docs | Only reads one file |
27+
| Accurate technical details | 15% | Correct on signaling vs transport | Confuses signaling with relay |
28+
| Actionable conclusion | 10% | Summarizes when each transport is used | Ends without synthesis |
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Oracle Eval: Cross-Module Investigation
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
> "How does the capture flow work from tab detection to published artifact?"
10+
11+
## Expected Output
12+
13+
Must trace the full path through multiple packages:
14+
15+
1. **Extension (capture)**: Background service worker detects tabs, creates TabCandidate entries in Dexie
16+
2. **Shared/agent (refine)**: Agent pipeline runs 16-skill observation on candidates, produces CoopInterpretation
17+
3. **Extension/Popup + Chickens (review)**: User reviews candidates/drafts, decides what to push
18+
4. **Shared/coop (publish)**: Explicit push creates Artifact in shared Yjs document
19+
5. **Shared/storage + sync (sync)**: Artifact syncs to peers via y-webrtc, persisted to Dexie
20+
6. **Shared/archive (archive)**: Optional archive to Storacha/Filecoin with receipt
21+
22+
Must identify the key boundary: everything before step 4 is local-only, step 4 is the explicit push that makes data shared.
23+
24+
Must cite specific file paths in at least 3 packages.
25+
26+
## Eval Criteria
27+
28+
| Criterion | Weight | Pass | Fail |
29+
|-----------|--------|------|------|
30+
| Covers 3+ packages | 25% | Traces through extension, shared, and at least one more | Only covers one package |
31+
| Identifies local → shared boundary | 25% | Calls out explicit push as the privacy boundary | Treats everything as shared |
32+
| Specific file paths | 20% | Names actual source files | Only mentions package names |
33+
| Correct data flow order | 20% | Capture → refine → review → publish → sync → archive | Wrong sequence or missing steps |
34+
| Synthesis/summary | 10% | Ends with a clear mental model | Dumps findings without conclusion |

.claude/evals/results.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Eval Results
2+
3+
| Date | Model | Agent | Task | Score | Turns | Notes |
4+
|------|-------|-------|------|-------|-------|-------|
5+
||||||| No evals run yet |
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Triage Eval: P1 Critical Bug
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
> User reports: "Extension popup is blank white screen after update, console shows TypeError: Cannot read properties of undefined (reading 'map')"
10+
11+
## Expected Output
12+
13+
- **Severity**: P1 (service down for users)
14+
- **Type**: bug
15+
- **Affected package**: extension
16+
- **Root cause hint**: Likely a data shape change in shared breaking a .map() call in popup
17+
- **Route to**: cracked-coder with `--mode tdd_bugfix`
18+
- **Not**: oracle (no investigation needed — clear stack trace), not P2 (popup is primary surface)
19+
20+
## Eval Criteria
21+
22+
| Criterion | Weight | Pass | Fail |
23+
|-----------|--------|------|------|
24+
| Severity = P1 | 30% | Correctly identifies as P1 | Classifies as P2 or lower |
25+
| Package = extension | 20% | Identifies extension as affected | Points to shared or app |
26+
| Routes to cracked-coder | 25% | Routes to cracked-coder with TDD | Routes to oracle or /debug |
27+
| Concise brief | 15% | Brief is < 5 lines | Over-explains or speculates |
28+
| Correct type = bug | 10% | Identifies as bug | Classifies as enhancement |
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Triage Eval: P2 Sync Persistence Issue
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
> User reports: "Yjs sync works between two browsers but published artifacts don't appear in the coop feed for the third member who joins later"
10+
11+
## Expected Output
12+
13+
- **Severity**: P2 (degraded functionality, not total failure)
14+
- **Type**: bug
15+
- **Affected package**: shared (storage/sync modules)
16+
- **Root cause hint**: Likely a persistence gap — Yjs state not persisted to Dexie or y-websocket fallback not loading historical state for late joiners
17+
- **Route to**: oracle first (investigate sync/persistence boundary), then cracked-coder
18+
- **Not**: P1 (sync works for active peers), not extension-only (this is a shared module issue)
19+
20+
## Eval Criteria
21+
22+
| Criterion | Weight | Pass | Fail |
23+
|-----------|--------|------|------|
24+
| Severity = P2 | 25% | Correctly identifies as P2 | Classifies as P1 or P3 |
25+
| Identifies sync/persistence boundary | 25% | Points to storage/sync modules | Treats as pure UI issue |
26+
| Routes oracle → cracked-coder | 25% | Two-step: investigate then fix | Goes straight to cracked-coder |
27+
| Package = shared | 15% | Identifies shared as affected | Points only to extension |
28+
| Concise | 10% | Brief is < 5 lines | Over-explains |
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Triage Eval: P3 Enhancement Request
2+
3+
**Status**: ready
4+
**Last run**: —
5+
**Last score**: —
6+
7+
## Scenario
8+
9+
> User asks: "Can we add a dark mode toggle to the popup header?"
10+
11+
## Expected Output
12+
13+
- **Severity**: P3 (nice-to-have improvement)
14+
- **Type**: enhancement
15+
- **Affected package**: extension (popup surface)
16+
- **Route to**: `/plan` with react + ui-compliance skills
17+
- **Not**: P2 (not a degradation), not cracked-coder directly (needs planning first)
18+
19+
## Eval Criteria
20+
21+
| Criterion | Weight | Pass | Fail |
22+
|-----------|--------|------|------|
23+
| Severity = P3 | 30% | Correctly identifies as P3 | Classifies as P2 or P4 |
24+
| Type = enhancement | 20% | Identifies as enhancement | Classifies as bug |
25+
| Routes to /plan | 25% | Routes to /plan for planning | Routes directly to cracked-coder |
26+
| Suggests skills | 15% | Mentions react and/or ui-compliance | No skill suggestion |
27+
| Concise | 10% | Brief is < 5 lines | Over-explains |

0 commit comments

Comments
 (0)