feat(intent): run commit-intent debugger on real git history + doc accuracy pass (v1.1.89)

sarthakagrawal927 · claude · sarthakagrawal927 · commit e5338b7db0ac · 2026-06-19T11:13:24.000+05:30
IntentDebugger was a fixture-only prototype. It now reads real commits:
- new Tauri command list_commit_intents(repo_path, limit) parses
  `git log --numstat` with control-char separators, classifies each
  file's surface + agent-vs-human authorship, and derives a test-evidence
  signal per commit
- IntentDebugger.tsx gains a repo picker, shows the real commit subject as
  the card title (previously dropped), and keeps fixtures as a browser
  fallback
- gate the "agent-authored UI change" risk on uiFileCount&gt;0 so non-UI
  agent commits get a generic intent-check risk instead of a false UI flag
- reachable via links on the Roadmap page (no new top-nav tab)

Docs accuracy pass:
- README gaps table reflects real state (synthetic-QA + intent debugger
  are no longer "not implemented")
- landing page corrected to Astro (deployed dir is apps/landing-page-astro,
  not the legacy Next.js one); CI section documents auto-release.yml
- dropped the orphaned "2022 Themesberg" license note (no such code in repo);
  fixed the Tauri v1 -&gt; v2 prerequisites link
- agents.md nav list updated to the 8 tabs + URL-only surfaces

Verification: cargo test (classify_surface/classify_author) + tsc + eslint
+ test:intent-debugger all green; report quality checked against this
repo's real commits.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -23,10 +23,10 @@ The near-term wedge is not beating Claude, Codex, or hosted PR bots at generic r
 |---|---|---|
 | Code review | Review tab runs local diffs through CLI agents and persists findings. | Needs multi-pass specialist review, better AGENTS.md/project-context ingestion, and benchmarked catch-rate evidence. |
 | Bug finding | Findings, severity, code viewer, and re-review loop exist. | Needs runtime evidence from tests/browser sessions/logs, not only static diff judgment. |
-| Agent-written code verification | Product is aimed at agent output and can fix/re-review selected findings. | Needs agent provenance: which agent changed what, prompt/task context, and whether the fix actually resolved the original user goal. |
+| Agent-written code verification | Aimed at agent output; fixes/re-reviews selected findings and emits a full verification handoff proof (`review-proof` + `agent-fix-packet`: per-finding evidence, fixed/reproduced/unchecked tallies, and a copyable reviewer handoff). | Needs to close the intent loop: did the fix actually resolve the original user goal, and which agent/prompt produced the change. |
 | Debugging/replay | History indexes Claude/Codex sessions and can replay conversations. | Replay is not connected to files, diffs, failures, screenshots, tests, or review findings. |
-| Synthetic user QA | Not implemented as a first-class workflow. | Needs browser/app automation that performs user tasks, captures screenshots/traces, and converts failures into review findings. |
-| AI step-through debugger | Not implemented. | Needs an execution timeline across agent actions, file edits, commands, test failures, and UI observations. |
+| Synthetic user QA | Prototype — `QaReplay` (`/qa-replay`, linked from Roadmap) runs fixture-backed synthetic-QA loops with a live agent-runner track. | Needs real browser/app automation that drives the actual product, captures screenshots/traces, and converts failures into review findings. |
+| AI step-through debugger | Commit-intent debugger (`/intent-debugger`, linked from Roadmap) now runs over **real** recent commits — pick a repo, and it infers intent, risks, verification gaps, and agent-vs-human authorship per commit. | Still per-commit static analysis; needs a full execution timeline across agent actions, file edits, commands, test failures, and UI observations. |
 | Codebase history explainer | Repo Unpacked generates repo briefs; History indexes agent sessions. | Needs commit/decision mining tied to touched files so reviews can catch intent regressions. |
 
 The product should prefer narrow, evidence-backed loops over broad "code intelligence" surfaces. A feature is on-strategy when it helps answer: "What changed, why did the agent change it, what could break, can we reproduce it, and did the fix actually work?"
@@ -36,11 +36,11 @@ The product should prefer narrow, evidence-backed loops over broad "code intelli
 | Concern | Service |
 |---------|---------|
 | Desktop app | GitHub Releases — Tauri 2 macOS build, with `@tauri-apps/plugin-updater` auto-updater (`latest.json` manifest) |
-| Landing page | Cloudflare Pages (`codevetter`, codevetter.com) — static Next.js export |
+| Landing page | Cloudflare Pages (`codevetter`, codevetter.com) — static Astro export |
 | Database | Local SQLite via `@tauri-apps/plugin-sql` (desktop only, no server) |
 | Auth | None — LLM provider API keys stored in user settings |
 | AI | User-supplied keys (Anthropic / OpenAI / OpenRouter) |
-| CI/CD | GitHub Actions — `release.yml` builds Tauri binaries on GitHub release; `deploy-landing.yml` deploys the landing page to Cloudflare Pages on push to `main` |
+| CI/CD | GitHub Actions — `auto-release.yml` cuts a `v<version>` release when `apps/desktop/src-tauri/tauri.conf.json`'s version changes on `main`, which dispatches `release.yml` to build/sign/upload the Tauri binaries; `deploy-landing.yml` deploys the landing page to Cloudflare Pages on push to `main` |
 
 ## Installation
 
@@ -66,7 +66,7 @@ cd CodeVetter
 npm install
 ```
 
-> Requires [Rust + Tauri prerequisites](https://tauri.app/v1/guides/getting-started/prerequisites) for the desktop app.
+> Requires the [Rust + Tauri 2 prerequisites](https://v2.tauri.app/start/prerequisites/) for the desktop app.
 
 ## Quick Start
 
@@ -77,32 +77,33 @@ npm install
    ```
 3. Open the Review tab, pick a local repository, and run your first review through an installed CLI agent.
 
-## Usage Examples
+## Common Tasks
 
-**Run the desktop app (dev mode)**
+**Build a production desktop binary**
 ```bash
 cd apps/desktop
-npm run tauri:dev
+npm run tauri:build
 ```
 
-**Run Playwright end-to-end tests for the desktop app**
+**Run the Playwright end-to-end suite**
 ```bash
 cd apps/desktop
 npm test
 ```
 
 **Build the landing page**
 ```bash
-cd apps/landing-page
+cd apps/landing-page-astro
 npm run build
 ```
 
 ## Monorepo Structure
 
 ```
 apps/
-  desktop/          Tauri 2 + React 19 + Vite desktop app — the core product
-  landing-page/     Next.js marketing site (static export, deployed to Cloudflare Pages — codevetter.com)
+  desktop/             Tauri 2 + React 19 + Vite desktop app — the core product
+  landing-page-astro/  Astro marketing site (static export, deployed to Cloudflare Pages — codevetter.com)
+  landing-page/        Legacy Next.js marketing site — superseded by landing-page-astro, no longer deployed
 ```
 
 ## Tech Stack
@@ -112,13 +113,13 @@ apps/
 | Desktop frontend | React 19, Vite, Tailwind CSS, shadcn/ui |
 | Desktop backend | Rust (Tauri 2), SQLite |
 | Review engine | TypeScript — runs in the webview, no server required |
-| Landing page | Next.js 15 (static export → Cloudflare Pages) |
+| Landing page | Astro 5 (static export → Cloudflare Pages) |
 | Testing | Playwright (e2e) |
 | Package manager | npm workspaces |
 
 ## License
 
-ISC (root package); MIT (landing-page template — Copyright 2022 Themesberg)
+ISC — see the root `package.json`.
 
 <!-- ACTIVE-AI-TASK-LOG:START -->
 ## Active AI Task Log
diff --git a/agents.md b/agents.md
@@ -53,8 +53,9 @@ npm install           # Install all workspace deps
 - **Tauri IPC**: all Rust commands called via typed wrappers in `src/lib/tauri-ipc.ts` → `invoke()` → `src-tauri/src/commands/`.
 - **`isTauriAvailable()` guard**: all IPC calls wrapped so React code also works in plain browser.
 - **FIXED**: Dead `@code-reviewer/*` workspace deps removed — `packages/` dir no longer exists and is no longer referenced. Build passes.
-- **Active screens**: Dashboard (usage/token analytics), History (session search), Review (`/review` — AI code review with diff + fix), Repo Unpacked (`/unpack` — whole-repo evidence-backed system brief, scanner in `src-tauri/src/commands/unpack.rs`, page in `apps/desktop/src/pages/RepoUnpacked.tsx`, persisted to `repo_unpacked_reports` table). Other tabs (Board, Workspaces) are legacy — do not invest in them.
-- **GH Actions**: `ci.yml` runs lint + Playwright; `release.yml` builds platform binaries and uploads to GitHub Releases.
+- **Nav (8 tabs)**: Home (`/` — usage/token analytics + session history), Review (`/review` — AI code review with diff + fix), Roadmap (`/roadmap` — shipped/verification telemetry dashboard), Unpack (`/unpack` — whole-repo evidence-backed system brief; scanner in `src-tauri/src/commands/unpack.rs`, page in `apps/desktop/src/pages/RepoUnpacked.tsx`, persisted to `repo_unpacked_reports` table), Intel (`/intel`), Fleet (`/fleet` — SaaS Maker fleet projects + repo↔project linking), T-Rex (`/trex`), Settings (`/settings` — also hosts Ops, Memories, Rubrics, usage, about).
+- **URL-only surfaces** (reachable but intentionally off the top nav after the v1.1.86 declutter): Rubrics (`/rubrics`, linked from Review), IntentDebugger (`/intent-debugger` — commit-intent analysis over real git commits), QaReplay (`/qa-replay` — synthetic-QA fixture/live runner). The old Ask/Personas tabs and their Rust backend were removed in v1.1.87.
+- **GH Actions**: `ci.yml` runs lint + Playwright; `auto-release.yml` cuts a `v<version>` release on `tauri.conf.json` version bump → dispatches `release.yml` to build/sign/upload binaries; `deploy-landing.yml` deploys `apps/landing-page-astro` to Cloudflare Pages.
 - Husky pre-commit runs lint-staged on `apps/desktop/src/**/*.{ts,tsx}`; pre-push hook also configured.
 
 <!-- FLEET-GUIDANCE:START -->
diff --git a/apps/desktop/package.json b/apps/desktop/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@code-reviewer/desktop",
-  "version": "1.1.88",
+  "version": "1.1.89",
   "private": true,
   "scripts": {
     "dev": "lsof -ti:1420 | xargs kill -9 2>/dev/null; vite",
diff --git a/apps/desktop/src-tauri/src/commands/git.rs b/apps/desktop/src-tauri/src/commands/git.rs
@@ -114,6 +114,183 @@ pub async fn list_pull_requests(repo_path: String) -> Result<Value, String> {
     Ok(json!({ "pull_requests": prs }))
 }
 
+/// Analyze the last `limit` real git commits and shape them for the commit-intent
+/// debugger. For each commit returns sha, subject, an agent-vs-human author class,
+/// changed files with per-file additions/deletions and a coarse surface class, and
+/// an evidence signal derived from whether the commit touched test files. The
+/// frontend feeds these straight into `buildCommitIntentReport` (same shape as the
+/// old fixtures), so the prototype now runs on real history instead of canned data.
+#[tauri::command]
+pub async fn list_commit_intents(repo_path: String, limit: Option<u32>) -> Result<Value, String> {
+    let n = limit.unwrap_or(8).clamp(1, 50);
+    // Use control chars as separators so commit subjects/bodies containing '|' or
+    // newlines never corrupt parsing: %x1e starts a record, %x1f splits header
+    // fields, %x02 ends the header (numstat block follows on the next lines).
+    let log = StdCommand::new("git")
+        .args([
+            "log",
+            "-n",
+            &n.to_string(),
+            "--no-merges",
+            "--numstat",
+            "--pretty=format:%x1ecommit%x1f%H%x1f%s%x1f%an%x1f%ae%x1f%b%x02",
+        ])
+        .current_dir(&repo_path)
+        .output()
+        .map_err(|e| format!("failed to run git log: {e}"))?;
+    if !log.status.success() {
+        return Err(format!(
+            "git log failed: {}",
+            String::from_utf8_lossy(&log.stderr).trim()
+        ));
+    }
+
+    let stdout = String::from_utf8_lossy(&log.stdout);
+    let mut commits: Vec<Value> = Vec::new();
+
+    for record in stdout.split('\u{1e}') {
+        if !record.starts_with("commit") {
+            continue;
+        }
+        let mut split = record.splitn(2, '\u{02}');
+        let header = split.next().unwrap_or("");
+        let numstat = split.next().unwrap_or("");
+
+        // header_parts[0] == "commit"; then sha, subject, author name, email, body
+        let header_parts: Vec<&str> = header.split('\u{1f}').collect();
+        if header_parts.len() < 5 {
+            continue;
+        }
+        let sha = header_parts[1].trim();
+        let subject = header_parts[2].trim();
+        let author_name = header_parts[3].trim();
+        let author_email = header_parts[4].trim();
+        let commit_body = header_parts.get(5).copied().unwrap_or("");
+        if sha.is_empty() {
+            continue;
+        }
+
+        let mut changed_files: Vec<Value> = Vec::new();
+        let mut test_files = 0u32;
+        for fl in numstat.lines() {
+            let fl = fl.trim();
+            if fl.is_empty() {
+                continue;
+            }
+            let cols: Vec<&str> = fl.split('\t').collect();
+            if cols.len() < 3 {
+                continue;
+            }
+            // Binary files report "-" for additions/deletions.
+            let additions: u32 = cols[0].parse().unwrap_or(0);
+            let deletions: u32 = cols[1].parse().unwrap_or(0);
+            let path = cols[2].trim();
+            if path.is_empty() {
+                continue;
+            }
+            let surface = classify_surface(path);
+            if surface == "test" {
+                test_files += 1;
+            }
+            changed_files.push(json!({
+                "path": path,
+                "additions": additions,
+                "deletions": deletions,
+                "surface": surface,
+            }));
+        }
+
+        // Real commits carry no linked verification runs; the one honest signal we
+        // have is whether the commit shipped test changes alongside the code.
+        let evidence: Vec<Value> = if test_files > 0 {
+            vec![json!({
+                "kind": "test",
+                "label": format!(
+                    "{} test file{} changed in this commit",
+                    test_files,
+                    if test_files == 1 { "" } else { "s" }
+                ),
+                "status": "pass",
+            })]
+        } else {
+            Vec::new()
+        };
+
+        let short = if sha.len() > 8 { &sha[..8] } else { sha };
+        commits.push(json!({
+            "id": sha,
+            "author": classify_author(author_name, author_email, commit_body),
+            "sha": short,
+            "message": subject,
+            "changedFiles": changed_files,
+            "evidence": evidence,
+        }));
+    }
+
+    Ok(json!({ "commits": commits }))
+}
+
+/// Coarse surface classification for a changed file path, mirroring the frontend's
+/// `inferReviewSurfaces` priority so commit and review reports read consistently.
+fn classify_surface(path: &str) -> &'static str {
+    let p = path.to_ascii_lowercase();
+    if p.contains("/tests/")
+        || p.starts_with("tests/")
+        || p.contains(".test.")
+        || p.contains(".spec.")
+        || p.contains("__tests__")
+    {
+        return "test";
+    }
+    if p.ends_with(".tsx")
+        || p.ends_with(".jsx")
+        || p.ends_with(".css")
+        || p.contains("/components/")
+        || p.contains("/pages/")
+    {
+        return "ui";
+    }
+    if p.contains("src-tauri")
+        || p.contains("commands/")
+        || p.ends_with(".rs")
+        || p.contains("/api")
+        || p.contains("server")
+    {
+        return "api";
+    }
+    if p.ends_with(".md") || p.contains("docs/") {
+        return "docs";
+    }
+    "config"
+}
+
+/// Classify a commit as agent- or human-authored from author identity and trailers
+/// (e.g. the `Co-Authored-By: Claude` trailer this repo uses for agent commits).
+fn classify_author(name: &str, email: &str, body: &str) -> &'static str {
+    let hay = format!(
+        "{}\n{}\n{}",
+        name.to_ascii_lowercase(),
+        email.to_ascii_lowercase(),
+        body.to_ascii_lowercase()
+    );
+    const AGENT_MARKERS: [&str; 9] = [
+        "co-authored-by: claude",
+        "noreply@anthropic.com",
+        "claude",
+        "codex",
+        "cursor",
+        "github-actions",
+        "[bot]",
+        "aider",
+        "devin",
+    ];
+    if AGENT_MARKERS.iter().any(|m| hay.contains(m)) {
+        "agent"
+    } else {
+        "human"
+    }
+}
+
 /// Check GitHub authentication status.
 /// Tries: 1) saved token in preferences, 2) GH_TOKEN env, 3) `gh auth status`.
 /// Returns connection info including username, auth method, and scopes.
@@ -2527,4 +2704,37 @@ mod tests {
         let s = build_compact_history_section_for_prompt("/tmp/x", &[], &conn);
         assert!(s.is_empty());
     }
+
+    #[test]
+    fn classify_surface_buckets_paths() {
+        assert_eq!(classify_surface("apps/desktop/src/pages/Home.tsx"), "ui");
+        assert_eq!(classify_surface("apps/desktop/src/lib/foo.test.ts"), "test");
+        assert_eq!(classify_surface("apps/desktop/tests/e2e/app.spec.ts"), "test");
+        assert_eq!(classify_surface("src-tauri/src/commands/git.rs"), "api");
+        assert_eq!(classify_surface("README.md"), "docs");
+        assert_eq!(classify_surface("docs/architecture.md"), "docs");
+        assert_eq!(classify_surface("apps/desktop/src-tauri/tauri.conf.json"), "api");
+        assert_eq!(classify_surface("package.json"), "config");
+        assert_eq!(classify_surface(".github/workflows/ci.yml"), "config");
+    }
+
+    #[test]
+    fn classify_author_detects_agents() {
+        assert_eq!(
+            classify_author("Sarthak Agrawal", "sarthak@example.com", ""),
+            "human"
+        );
+        assert_eq!(
+            classify_author(
+                "Sarthak Agrawal",
+                "sarthak@example.com",
+                "feat: thing\n\nCo-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>"
+            ),
+            "agent"
+        );
+        assert_eq!(
+            classify_author("github-actions[bot]", "actions@github.com", ""),
+            "agent"
+        );
+    }
 }
diff --git a/apps/desktop/src-tauri/src/main.rs b/apps/desktop/src-tauri/src/main.rs
@@ -314,6 +314,7 @@ fn main() {
             commands::git::list_git_branches,
             commands::git::get_git_remote_info,
             commands::git::list_pull_requests,
+            commands::git::list_commit_intents,
             commands::git::check_github_auth,
             commands::git::sync_github_token,
             commands::git::get_repo_history_context,
diff --git a/apps/desktop/src-tauri/tauri.conf.json b/apps/desktop/src-tauri/tauri.conf.json
@@ -2,7 +2,7 @@
   "$schema": "https://raw.githubusercontent.com/tauri-apps/tauri/dev/crates/tauri-utils/schema.json",
   "identifier": "com.codevetter.desktop",
   "productName": "CodeVetter",
-  "version": "1.1.88",
+  "version": "1.1.89",
   "build": {
     "beforeDevCommand": "npm run dev",
     "beforeBuildCommand": "npm run build",
diff --git a/apps/desktop/src/lib/intent-debugger/report.ts b/apps/desktop/src/lib/intent-debugger/report.ts
@@ -282,7 +282,11 @@ function inferReviewRisks(
 
 function inferRisks(fixture: CommitIntentFixture, totalChanged: number, uiFileCount: number) {
   const risks: string[] = [];
-  if (fixture.author === "agent") risks.push("Agent-authored UI change may satisfy static review while missing user-flow proof.");
+  if (fixture.author === "agent" && uiFileCount > 0) {
+    risks.push("Agent-authored UI change may satisfy static review while missing user-flow proof.");
+  } else if (fixture.author === "agent") {
+    risks.push("Agent-authored change; confirm it matches the intended task, not just a plausible diff.");
+  }
   if (uiFileCount > 0) risks.push("UI surface changed; screenshot or browser replay should exist before shipping.");
   if (totalChanged > 120) risks.push("Large diff for one intent; inspect for accidental refactor drift.");
   if (fixture.changedFiles.some((file) => file.surface === "config")) risks.push("Config changed; verify deploy/build assumptions.");
diff --git a/apps/desktop/src/lib/tauri-ipc.ts b/apps/desktop/src/lib/tauri-ipc.ts
@@ -6,6 +6,7 @@ import {
   sendNotification,
 } from "@tauri-apps/plugin-notification";
 
+import type { CommitIntentFixture } from "@/lib/intent-debugger/types";
 import { buildActiveStandardsContext } from "@/lib/review-service";
 
 // ─── Helpers ────────────────────────────────────────────────────────────────
@@ -1496,6 +1497,24 @@ export async function listPullRequests(
   return resp.pull_requests;
 }
 
+// ─── Commit Intent (real git history → intent debugger) ─────────────────────
+
+/**
+ * Analyze the last `limit` real commits in a repo and return them in the
+ * CommitIntentFixture shape the intent debugger renders. Replaces the canned
+ * COMMIT_INTENT_FIXTURES with actual git history.
+ */
+export async function listCommitIntents(
+  repoPath: string,
+  limit = 8
+): Promise<CommitIntentFixture[]> {
+  const resp = await safeInvoke<{ commits: CommitIntentFixture[] }>(
+    "list_commit_intents",
+    { repoPath, limit }
+  );
+  return resp.commits;
+}
+
 // ─── GitHub Auth ────────────────────────────────────────────────────────────
 
 export interface GitHubAuthStatus {
diff --git a/apps/desktop/src/pages/IntentDebugger.tsx b/apps/desktop/src/pages/IntentDebugger.tsx
diff --git a/apps/desktop/src/pages/Roadmap.tsx b/apps/desktop/src/pages/Roadmap.tsx

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@code-reviewer/desktop",`
`3`		`- "version": "1.1.88",`
	`3`	`+ "version": "1.1.89",`
`4`	`4`	`"private": true,`
`5`	`5`	`"scripts": {`
`6`	`6`	`"dev": "lsof -ti:1420 \| xargs kill -9 2>/dev/null; vite",`