Skip to content

Commit e5338b7

Browse files
feat(intent): run commit-intent debugger on real git history + doc accuracy pass (v1.1.89)
IntentDebugger was a fixture-only prototype. It now reads real commits: - new Tauri command list_commit_intents(repo_path, limit) parses `git log --numstat` with control-char separators, classifies each file's surface + agent-vs-human authorship, and derives a test-evidence signal per commit - IntentDebugger.tsx gains a repo picker, shows the real commit subject as the card title (previously dropped), and keeps fixtures as a browser fallback - gate the "agent-authored UI change" risk on uiFileCount>0 so non-UI agent commits get a generic intent-check risk instead of a false UI flag - reachable via links on the Roadmap page (no new top-nav tab) Docs accuracy pass: - README gaps table reflects real state (synthetic-QA + intent debugger are no longer "not implemented") - landing page corrected to Astro (deployed dir is apps/landing-page-astro, not the legacy Next.js one); CI section documents auto-release.yml - dropped the orphaned "2022 Themesberg" license note (no such code in repo); fixed the Tauri v1 -> v2 prerequisites link - agents.md nav list updated to the 8 tabs + URL-only surfaces Verification: cargo test (classify_surface/classify_author) + tsc + eslint + test:intent-debugger all green; report quality checked against this repo's real commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent daccbb6 commit e5338b7

10 files changed

Lines changed: 426 additions & 65 deletions

File tree

README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,10 @@ The near-term wedge is not beating Claude, Codex, or hosted PR bots at generic r
2323
|---|---|---|
2424
| Code review | Review tab runs local diffs through CLI agents and persists findings. | Needs multi-pass specialist review, better AGENTS.md/project-context ingestion, and benchmarked catch-rate evidence. |
2525
| Bug finding | Findings, severity, code viewer, and re-review loop exist. | Needs runtime evidence from tests/browser sessions/logs, not only static diff judgment. |
26-
| Agent-written code verification | Product is aimed at agent output and can fix/re-review selected findings. | Needs agent provenance: which agent changed what, prompt/task context, and whether the fix actually resolved the original user goal. |
26+
| Agent-written code verification | Aimed at agent output; fixes/re-reviews selected findings and emits a full verification handoff proof (`review-proof` + `agent-fix-packet`: per-finding evidence, fixed/reproduced/unchecked tallies, and a copyable reviewer handoff). | Needs to close the intent loop: did the fix actually resolve the original user goal, and which agent/prompt produced the change. |
2727
| Debugging/replay | History indexes Claude/Codex sessions and can replay conversations. | Replay is not connected to files, diffs, failures, screenshots, tests, or review findings. |
28-
| Synthetic user QA | Not implemented as a first-class workflow. | Needs browser/app automation that performs user tasks, captures screenshots/traces, and converts failures into review findings. |
29-
| AI step-through debugger | Not implemented. | Needs an execution timeline across agent actions, file edits, commands, test failures, and UI observations. |
28+
| Synthetic user QA | Prototype — `QaReplay` (`/qa-replay`, linked from Roadmap) runs fixture-backed synthetic-QA loops with a live agent-runner track. | Needs real browser/app automation that drives the actual product, captures screenshots/traces, and converts failures into review findings. |
29+
| AI step-through debugger | Commit-intent debugger (`/intent-debugger`, linked from Roadmap) now runs over **real** recent commits — pick a repo, and it infers intent, risks, verification gaps, and agent-vs-human authorship per commit. | Still per-commit static analysis; needs a full execution timeline across agent actions, file edits, commands, test failures, and UI observations. |
3030
| Codebase history explainer | Repo Unpacked generates repo briefs; History indexes agent sessions. | Needs commit/decision mining tied to touched files so reviews can catch intent regressions. |
3131

3232
The product should prefer narrow, evidence-backed loops over broad "code intelligence" surfaces. A feature is on-strategy when it helps answer: "What changed, why did the agent change it, what could break, can we reproduce it, and did the fix actually work?"
@@ -36,11 +36,11 @@ The product should prefer narrow, evidence-backed loops over broad "code intelli
3636
| Concern | Service |
3737
|---------|---------|
3838
| Desktop app | GitHub Releases — Tauri 2 macOS build, with `@tauri-apps/plugin-updater` auto-updater (`latest.json` manifest) |
39-
| Landing page | Cloudflare Pages (`codevetter`, codevetter.com) — static Next.js export |
39+
| Landing page | Cloudflare Pages (`codevetter`, codevetter.com) — static Astro export |
4040
| Database | Local SQLite via `@tauri-apps/plugin-sql` (desktop only, no server) |
4141
| Auth | None — LLM provider API keys stored in user settings |
4242
| AI | User-supplied keys (Anthropic / OpenAI / OpenRouter) |
43-
| CI/CD | GitHub Actions — `release.yml` builds Tauri binaries on GitHub release; `deploy-landing.yml` deploys the landing page to Cloudflare Pages on push to `main` |
43+
| CI/CD | GitHub Actions — `auto-release.yml` cuts a `v<version>` release when `apps/desktop/src-tauri/tauri.conf.json`'s version changes on `main`, which dispatches `release.yml` to build/sign/upload the Tauri binaries; `deploy-landing.yml` deploys the landing page to Cloudflare Pages on push to `main` |
4444

4545
## Installation
4646

@@ -66,7 +66,7 @@ cd CodeVetter
6666
npm install
6767
```
6868

69-
> Requires [Rust + Tauri prerequisites](https://tauri.app/v1/guides/getting-started/prerequisites) for the desktop app.
69+
> Requires the [Rust + Tauri 2 prerequisites](https://v2.tauri.app/start/prerequisites/) for the desktop app.
7070
7171
## Quick Start
7272

@@ -77,32 +77,33 @@ npm install
7777
```
7878
3. Open the Review tab, pick a local repository, and run your first review through an installed CLI agent.
7979

80-
## Usage Examples
80+
## Common Tasks
8181

82-
**Run the desktop app (dev mode)**
82+
**Build a production desktop binary**
8383
```bash
8484
cd apps/desktop
85-
npm run tauri:dev
85+
npm run tauri:build
8686
```
8787

88-
**Run Playwright end-to-end tests for the desktop app**
88+
**Run the Playwright end-to-end suite**
8989
```bash
9090
cd apps/desktop
9191
npm test
9292
```
9393

9494
**Build the landing page**
9595
```bash
96-
cd apps/landing-page
96+
cd apps/landing-page-astro
9797
npm run build
9898
```
9999

100100
## Monorepo Structure
101101

102102
```
103103
apps/
104-
desktop/ Tauri 2 + React 19 + Vite desktop app — the core product
105-
landing-page/ Next.js marketing site (static export, deployed to Cloudflare Pages — codevetter.com)
104+
desktop/ Tauri 2 + React 19 + Vite desktop app — the core product
105+
landing-page-astro/ Astro marketing site (static export, deployed to Cloudflare Pages — codevetter.com)
106+
landing-page/ Legacy Next.js marketing site — superseded by landing-page-astro, no longer deployed
106107
```
107108

108109
## Tech Stack
@@ -112,13 +113,13 @@ apps/
112113
| Desktop frontend | React 19, Vite, Tailwind CSS, shadcn/ui |
113114
| Desktop backend | Rust (Tauri 2), SQLite |
114115
| Review engine | TypeScript — runs in the webview, no server required |
115-
| Landing page | Next.js 15 (static export → Cloudflare Pages) |
116+
| Landing page | Astro 5 (static export → Cloudflare Pages) |
116117
| Testing | Playwright (e2e) |
117118
| Package manager | npm workspaces |
118119

119120
## License
120121

121-
ISC (root package); MIT (landing-page template — Copyright 2022 Themesberg)
122+
ISC — see the root `package.json`.
122123

123124
<!-- ACTIVE-AI-TASK-LOG:START -->
124125
## Active AI Task Log

agents.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,9 @@ npm install # Install all workspace deps
5353
- **Tauri IPC**: all Rust commands called via typed wrappers in `src/lib/tauri-ipc.ts``invoke()``src-tauri/src/commands/`.
5454
- **`isTauriAvailable()` guard**: all IPC calls wrapped so React code also works in plain browser.
5555
- **FIXED**: Dead `@code-reviewer/*` workspace deps removed — `packages/` dir no longer exists and is no longer referenced. Build passes.
56-
- **Active screens**: Dashboard (usage/token analytics), History (session search), Review (`/review` — AI code review with diff + fix), Repo Unpacked (`/unpack` — whole-repo evidence-backed system brief, scanner in `src-tauri/src/commands/unpack.rs`, page in `apps/desktop/src/pages/RepoUnpacked.tsx`, persisted to `repo_unpacked_reports` table). Other tabs (Board, Workspaces) are legacy — do not invest in them.
57-
- **GH Actions**: `ci.yml` runs lint + Playwright; `release.yml` builds platform binaries and uploads to GitHub Releases.
56+
- **Nav (8 tabs)**: Home (`/` — usage/token analytics + session history), Review (`/review` — AI code review with diff + fix), Roadmap (`/roadmap` — shipped/verification telemetry dashboard), Unpack (`/unpack` — whole-repo evidence-backed system brief; scanner in `src-tauri/src/commands/unpack.rs`, page in `apps/desktop/src/pages/RepoUnpacked.tsx`, persisted to `repo_unpacked_reports` table), Intel (`/intel`), Fleet (`/fleet` — SaaS Maker fleet projects + repo↔project linking), T-Rex (`/trex`), Settings (`/settings` — also hosts Ops, Memories, Rubrics, usage, about).
57+
- **URL-only surfaces** (reachable but intentionally off the top nav after the v1.1.86 declutter): Rubrics (`/rubrics`, linked from Review), IntentDebugger (`/intent-debugger` — commit-intent analysis over real git commits), QaReplay (`/qa-replay` — synthetic-QA fixture/live runner). The old Ask/Personas tabs and their Rust backend were removed in v1.1.87.
58+
- **GH Actions**: `ci.yml` runs lint + Playwright; `auto-release.yml` cuts a `v<version>` release on `tauri.conf.json` version bump → dispatches `release.yml` to build/sign/upload binaries; `deploy-landing.yml` deploys `apps/landing-page-astro` to Cloudflare Pages.
5859
- Husky pre-commit runs lint-staged on `apps/desktop/src/**/*.{ts,tsx}`; pre-push hook also configured.
5960

6061
<!-- FLEET-GUIDANCE:START -->

apps/desktop/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@code-reviewer/desktop",
3-
"version": "1.1.88",
3+
"version": "1.1.89",
44
"private": true,
55
"scripts": {
66
"dev": "lsof -ti:1420 | xargs kill -9 2>/dev/null; vite",

apps/desktop/src-tauri/src/commands/git.rs

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,183 @@ pub async fn list_pull_requests(repo_path: String) -> Result<Value, String> {
114114
Ok(json!({ "pull_requests": prs }))
115115
}
116116

117+
/// Analyze the last `limit` real git commits and shape them for the commit-intent
118+
/// debugger. For each commit returns sha, subject, an agent-vs-human author class,
119+
/// changed files with per-file additions/deletions and a coarse surface class, and
120+
/// an evidence signal derived from whether the commit touched test files. The
121+
/// frontend feeds these straight into `buildCommitIntentReport` (same shape as the
122+
/// old fixtures), so the prototype now runs on real history instead of canned data.
123+
#[tauri::command]
124+
pub async fn list_commit_intents(repo_path: String, limit: Option<u32>) -> Result<Value, String> {
125+
let n = limit.unwrap_or(8).clamp(1, 50);
126+
// Use control chars as separators so commit subjects/bodies containing '|' or
127+
// newlines never corrupt parsing: %x1e starts a record, %x1f splits header
128+
// fields, %x02 ends the header (numstat block follows on the next lines).
129+
let log = StdCommand::new("git")
130+
.args([
131+
"log",
132+
"-n",
133+
&n.to_string(),
134+
"--no-merges",
135+
"--numstat",
136+
"--pretty=format:%x1ecommit%x1f%H%x1f%s%x1f%an%x1f%ae%x1f%b%x02",
137+
])
138+
.current_dir(&repo_path)
139+
.output()
140+
.map_err(|e| format!("failed to run git log: {e}"))?;
141+
if !log.status.success() {
142+
return Err(format!(
143+
"git log failed: {}",
144+
String::from_utf8_lossy(&log.stderr).trim()
145+
));
146+
}
147+
148+
let stdout = String::from_utf8_lossy(&log.stdout);
149+
let mut commits: Vec<Value> = Vec::new();
150+
151+
for record in stdout.split('\u{1e}') {
152+
if !record.starts_with("commit") {
153+
continue;
154+
}
155+
let mut split = record.splitn(2, '\u{02}');
156+
let header = split.next().unwrap_or("");
157+
let numstat = split.next().unwrap_or("");
158+
159+
// header_parts[0] == "commit"; then sha, subject, author name, email, body
160+
let header_parts: Vec<&str> = header.split('\u{1f}').collect();
161+
if header_parts.len() < 5 {
162+
continue;
163+
}
164+
let sha = header_parts[1].trim();
165+
let subject = header_parts[2].trim();
166+
let author_name = header_parts[3].trim();
167+
let author_email = header_parts[4].trim();
168+
let commit_body = header_parts.get(5).copied().unwrap_or("");
169+
if sha.is_empty() {
170+
continue;
171+
}
172+
173+
let mut changed_files: Vec<Value> = Vec::new();
174+
let mut test_files = 0u32;
175+
for fl in numstat.lines() {
176+
let fl = fl.trim();
177+
if fl.is_empty() {
178+
continue;
179+
}
180+
let cols: Vec<&str> = fl.split('\t').collect();
181+
if cols.len() < 3 {
182+
continue;
183+
}
184+
// Binary files report "-" for additions/deletions.
185+
let additions: u32 = cols[0].parse().unwrap_or(0);
186+
let deletions: u32 = cols[1].parse().unwrap_or(0);
187+
let path = cols[2].trim();
188+
if path.is_empty() {
189+
continue;
190+
}
191+
let surface = classify_surface(path);
192+
if surface == "test" {
193+
test_files += 1;
194+
}
195+
changed_files.push(json!({
196+
"path": path,
197+
"additions": additions,
198+
"deletions": deletions,
199+
"surface": surface,
200+
}));
201+
}
202+
203+
// Real commits carry no linked verification runs; the one honest signal we
204+
// have is whether the commit shipped test changes alongside the code.
205+
let evidence: Vec<Value> = if test_files > 0 {
206+
vec![json!({
207+
"kind": "test",
208+
"label": format!(
209+
"{} test file{} changed in this commit",
210+
test_files,
211+
if test_files == 1 { "" } else { "s" }
212+
),
213+
"status": "pass",
214+
})]
215+
} else {
216+
Vec::new()
217+
};
218+
219+
let short = if sha.len() > 8 { &sha[..8] } else { sha };
220+
commits.push(json!({
221+
"id": sha,
222+
"author": classify_author(author_name, author_email, commit_body),
223+
"sha": short,
224+
"message": subject,
225+
"changedFiles": changed_files,
226+
"evidence": evidence,
227+
}));
228+
}
229+
230+
Ok(json!({ "commits": commits }))
231+
}
232+
233+
/// Coarse surface classification for a changed file path, mirroring the frontend's
234+
/// `inferReviewSurfaces` priority so commit and review reports read consistently.
235+
fn classify_surface(path: &str) -> &'static str {
236+
let p = path.to_ascii_lowercase();
237+
if p.contains("/tests/")
238+
|| p.starts_with("tests/")
239+
|| p.contains(".test.")
240+
|| p.contains(".spec.")
241+
|| p.contains("__tests__")
242+
{
243+
return "test";
244+
}
245+
if p.ends_with(".tsx")
246+
|| p.ends_with(".jsx")
247+
|| p.ends_with(".css")
248+
|| p.contains("/components/")
249+
|| p.contains("/pages/")
250+
{
251+
return "ui";
252+
}
253+
if p.contains("src-tauri")
254+
|| p.contains("commands/")
255+
|| p.ends_with(".rs")
256+
|| p.contains("/api")
257+
|| p.contains("server")
258+
{
259+
return "api";
260+
}
261+
if p.ends_with(".md") || p.contains("docs/") {
262+
return "docs";
263+
}
264+
"config"
265+
}
266+
267+
/// Classify a commit as agent- or human-authored from author identity and trailers
268+
/// (e.g. the `Co-Authored-By: Claude` trailer this repo uses for agent commits).
269+
fn classify_author(name: &str, email: &str, body: &str) -> &'static str {
270+
let hay = format!(
271+
"{}\n{}\n{}",
272+
name.to_ascii_lowercase(),
273+
email.to_ascii_lowercase(),
274+
body.to_ascii_lowercase()
275+
);
276+
const AGENT_MARKERS: [&str; 9] = [
277+
"co-authored-by: claude",
278+
"noreply@anthropic.com",
279+
"claude",
280+
"codex",
281+
"cursor",
282+
"github-actions",
283+
"[bot]",
284+
"aider",
285+
"devin",
286+
];
287+
if AGENT_MARKERS.iter().any(|m| hay.contains(m)) {
288+
"agent"
289+
} else {
290+
"human"
291+
}
292+
}
293+
117294
/// Check GitHub authentication status.
118295
/// Tries: 1) saved token in preferences, 2) GH_TOKEN env, 3) `gh auth status`.
119296
/// Returns connection info including username, auth method, and scopes.
@@ -2527,4 +2704,37 @@ mod tests {
25272704
let s = build_compact_history_section_for_prompt("/tmp/x", &[], &conn);
25282705
assert!(s.is_empty());
25292706
}
2707+
2708+
#[test]
2709+
fn classify_surface_buckets_paths() {
2710+
assert_eq!(classify_surface("apps/desktop/src/pages/Home.tsx"), "ui");
2711+
assert_eq!(classify_surface("apps/desktop/src/lib/foo.test.ts"), "test");
2712+
assert_eq!(classify_surface("apps/desktop/tests/e2e/app.spec.ts"), "test");
2713+
assert_eq!(classify_surface("src-tauri/src/commands/git.rs"), "api");
2714+
assert_eq!(classify_surface("README.md"), "docs");
2715+
assert_eq!(classify_surface("docs/architecture.md"), "docs");
2716+
assert_eq!(classify_surface("apps/desktop/src-tauri/tauri.conf.json"), "api");
2717+
assert_eq!(classify_surface("package.json"), "config");
2718+
assert_eq!(classify_surface(".github/workflows/ci.yml"), "config");
2719+
}
2720+
2721+
#[test]
2722+
fn classify_author_detects_agents() {
2723+
assert_eq!(
2724+
classify_author("Sarthak Agrawal", "sarthak@example.com", ""),
2725+
"human"
2726+
);
2727+
assert_eq!(
2728+
classify_author(
2729+
"Sarthak Agrawal",
2730+
"sarthak@example.com",
2731+
"feat: thing\n\nCo-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>"
2732+
),
2733+
"agent"
2734+
);
2735+
assert_eq!(
2736+
classify_author("github-actions[bot]", "actions@github.com", ""),
2737+
"agent"
2738+
);
2739+
}
25302740
}

apps/desktop/src-tauri/src/main.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,7 @@ fn main() {
314314
commands::git::list_git_branches,
315315
commands::git::get_git_remote_info,
316316
commands::git::list_pull_requests,
317+
commands::git::list_commit_intents,
317318
commands::git::check_github_auth,
318319
commands::git::sync_github_token,
319320
commands::git::get_repo_history_context,

apps/desktop/src-tauri/tauri.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"$schema": "https://raw.githubusercontent.com/tauri-apps/tauri/dev/crates/tauri-utils/schema.json",
33
"identifier": "com.codevetter.desktop",
44
"productName": "CodeVetter",
5-
"version": "1.1.88",
5+
"version": "1.1.89",
66
"build": {
77
"beforeDevCommand": "npm run dev",
88
"beforeBuildCommand": "npm run build",

apps/desktop/src/lib/intent-debugger/report.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,11 @@ function inferReviewRisks(
282282

283283
function inferRisks(fixture: CommitIntentFixture, totalChanged: number, uiFileCount: number) {
284284
const risks: string[] = [];
285-
if (fixture.author === "agent") risks.push("Agent-authored UI change may satisfy static review while missing user-flow proof.");
285+
if (fixture.author === "agent" && uiFileCount > 0) {
286+
risks.push("Agent-authored UI change may satisfy static review while missing user-flow proof.");
287+
} else if (fixture.author === "agent") {
288+
risks.push("Agent-authored change; confirm it matches the intended task, not just a plausible diff.");
289+
}
286290
if (uiFileCount > 0) risks.push("UI surface changed; screenshot or browser replay should exist before shipping.");
287291
if (totalChanged > 120) risks.push("Large diff for one intent; inspect for accidental refactor drift.");
288292
if (fixture.changedFiles.some((file) => file.surface === "config")) risks.push("Config changed; verify deploy/build assumptions.");

apps/desktop/src/lib/tauri-ipc.ts

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import {
66
sendNotification,
77
} from "@tauri-apps/plugin-notification";
88

9+
import type { CommitIntentFixture } from "@/lib/intent-debugger/types";
910
import { buildActiveStandardsContext } from "@/lib/review-service";
1011

1112
// ─── Helpers ────────────────────────────────────────────────────────────────
@@ -1496,6 +1497,24 @@ export async function listPullRequests(
14961497
return resp.pull_requests;
14971498
}
14981499

1500+
// ─── Commit Intent (real git history → intent debugger) ─────────────────────
1501+
1502+
/**
1503+
* Analyze the last `limit` real commits in a repo and return them in the
1504+
* CommitIntentFixture shape the intent debugger renders. Replaces the canned
1505+
* COMMIT_INTENT_FIXTURES with actual git history.
1506+
*/
1507+
export async function listCommitIntents(
1508+
repoPath: string,
1509+
limit = 8
1510+
): Promise<CommitIntentFixture[]> {
1511+
const resp = await safeInvoke<{ commits: CommitIntentFixture[] }>(
1512+
"list_commit_intents",
1513+
{ repoPath, limit }
1514+
);
1515+
return resp.commits;
1516+
}
1517+
14991518
// ─── GitHub Auth ────────────────────────────────────────────────────────────
15001519

15011520
export interface GitHubAuthStatus {

0 commit comments

Comments
 (0)