Status: Draft / Research
Context: This document outlines the feature roadmap for stet after the core "Defect-Focused" implementation (Phase 6) and "Polish" (Phase 7) are complete. Roadmap phases below (Ecosystem, Adaptive, Deep Context) are thematic and distinct from implementation plan phases 0–9.
Objective: Evolve stet from a "local CLI tool" into a "universal review agent" that integrates with AI IDEs and learns from user behavior.
- Precision-focused design: Stet defaults to fewer, high-confidence, actionable findings (abstention, FP kill list, prompt shadowing, optimizer). Some false positives are expected; industry benchmarks for AI code review are roughly 5–15% FP rate, with precision-focused tools often at 5–8%. Stet aims for the lower end via context, filters, and feedback; monitor and tune (strictness presets,
stet optimize, dismiss reasons) to keep noise acceptable. - Human-in-the-loop: Treat stet output as first-pass review; the human makes the final call. Dismiss reasons and history improve future runs. This aligns with best practice: AI complements, not replaces, human review.
- Context-aware review: Git intent, hunk expansion, RAG-lite, and (roadmap) cross-file impact are core FP-reduction strategies—broader context reduces superficial or irrelevant flags.
The following exist today; this roadmap only lists work not yet implemented.
- CLI:
stet start,run,rerun,finish,status,list,dismiss,optimize,doctor,cleanup. - Session and findings: Active findings (Findings minus DismissedIDs), JSON/human output, stable finding IDs.
- History: Append to
.review/history.jsonlon dismiss (with reasons) and on finish; schema supports optimizer. - RAG-lite: Symbol definitions injected per language (Go, JS/TS, Python, Swift, Java); config
rag_symbol_max_*. - Hunk expansion:
expand.ExpandHunkfor enclosing function (Go) and N-line fallback. - Optimizer:
stet optimizewritessystem_prompt_optimized.txt; no suggested config output yet. - Strictness and nitpicky: Presets and config/env overrides.
- Impact reporting (
stet statsvolume, quality, energy) is specified in implementation-plan.md Phase 9 (sub-phases 9.1–9.8). Implement from that document.
Goal: Support multiple LLM backends via a pluggable interface. Initially two backends: Ollama (direct API, existing) and OpenCode (via server API). Using OpenCode gives access to 75+ providers (OpenAI, Anthropic, OpenRouter, etc.) as OpenCode adds them, without stet maintaining integrations.
- Status: Not started.
- Goal: Define an
LLMClientinterface and refactor the review path to use it. Wrap the existing Ollama client behind this interface. - Entry points: New package
cli/internal/llmwith interface; cli/internal/ollama implements it; cli/internal/review/review.go and cli/internal/run/run.go accept the interface. - Interface:
Generate(ctx, model, systemPrompt, userPrompt string, opts) (*GenerateResult, error),Check(ctx, model) (*CheckResult, error), optionalShow(ctx, model) (*ShowResult, error)for context length.GenerateResultincludes response text andUsage(prompt/completion tokens, durations). - Code to reuse: ollama.Client with
Check,Show,Generate; review.ReviewHunk (line 75) and run loop in run.Run (line 654) callclient.Generate. - Implementation chunks:
- Define
LLMClientinterface and shared types (GenerateResult,Usage,CheckResult) incli/internal/llm. - Create
ollama.Adapter(or makeollama.Clientimplement the interface) wrapping existing logic. - Refactor
ReviewHunkand run package to acceptllm.Clientinterface instead of*ollama.Client. - Add config
backend(ollama|opencode) and backend selection factory.
- Define
- Config and env:
backend(defaultollama),ollama_base_url(unchanged). Env:STET_BACKEND,STET_OLLAMA_BASE_URL. - Acceptance criteria: With
backend=ollama, behavior unchanged. Dry-run and full review path work as today. - Tests: Unit tests for adapter; integration tests with mock Ollama unchanged.
- Status: Not started.
- Goal: Implement the
LLMClientinterface using the OpenCode server API. Users runopencode serveand configure stet to use it; stet gets access to all providers OpenCode supports (75+ via Models.dev). - Entry points: New package
cli/internal/llm/opencode; implementLLMClientusing github.com/sst/opencode-sdk-go. - Flow: Create session via
Session.New; for each hunk callSession.Promptwith system + user prompt; extract text from responseparts(TextPart); parse JSON with existingParseFindingsResponse. MapAssistantMessage.Tokens(Input, Output, Reasoning) to stetUsage. - Code to reuse: opencode-sdk-go (
Session.New,Session.Prompt); review.ParseFindingsResponse for JSON parsing. - Implementation chunks:
- OpenCode client adapter: configure SDK (base URL, basic auth via
OPENCODE_SERVER_PASSWORD). Check:GET /global/healthfor server reachability; optionally verify model via/config/providers.Generate: create session, send message withsystemandparts(TextPart for user prompt), extract text from response parts, map tokens to Usage.- Handle session lifecycle (create per run or per hunk; document that
opencode serveshould run from repo root for project context). - Error mapping: connection/server errors →
ErrUnreachableequivalent forstet doctorand exit code 2.
- OpenCode client adapter: configure SDK (base URL, basic auth via
- Config and env:
opencode_base_url(defaulthttp://localhost:4096),opencode_password(optional, forOPENCODE_SERVER_PASSWORD). Env:STET_OPENCODE_BASE_URL,STET_OPENCODE_PASSWORD. Model uses OpenCode model ID (e.g.anthropic/claude-sonnet-4,ollama/qwen3-coder:30b). - Acceptance criteria: With
backend=opencodeandopencode serverunning (with providers configured),stet startandstet runcomplete using OpenCode's selected model.stet doctorchecks OpenCode server health when backend is opencode. - Tests: Unit tests with mock HTTP server simulating OpenCode API; optional integration test with real
opencode serve. - Documentation: User must run
opencode servefrom repo root; configure providers in OpenCode (/connect,opencode.json).
- Go: opencode-sdk-go requires Go 1.22+ (stet already targets this).
- OpenCode: User must have OpenCode installed and run
opencode servewhen using the opencode backend.
Goal: Transform stet from a standalone tool into a service that AI Editors (Cursor, Windsurf, Claude) can consume directly.
- Status: Not started.
- Goal: Expose stet via the Model Context Protocol so IDEs can trigger reviews and read findings without manual CLI execution.
- Entry points: New binary (e.g.
cmd/stet-mcp/) or subcommand (e.g.stet mcpstdio). If subcommand: register withrootCmd.AddCommand(newMcpCmd())in cli/cmd/stet/main.go (around line 170). Document the chosen entry point. - Inputs: MCP JSON-RPC over stdio. Tool
run_review(scope: string)receives scope (e.g. staged, commit, branch). Toolget_findings(min_confidence: float)receives optional confidence threshold. State dir from envSTET_STATE_DIRor default.reviewrelative to repo root. - Outputs: Resource
stet://latest_reportreturns last review JSON (session + findings). Tools return JSON per MCP spec (e.g. findings array). Transport: stdio JSON-RPC; specify MCP spec version in docs. - Code to reuse: session.Load(stateDir); pattern for active findings and writing findings JSON in cli/cmd/stet/main.go (e.g.
activeFindings,writeFindingsJSON). run package for triggering review (Start/Run). Repo root from cwd via git.RepoRoot. - Implementation chunks:
- MCP server skeleton: stdio transport, JSON-RPC request/response dispatch. Unit test with canned stdin/stdout.
- Resource
stet://latest_report: read session and findings from state dir; return JSON. Test: write fixture session, assert resource content. - Tool
run_review(scope): invoke existing start/run flow (or subprocess of stet CLI). Test: mock or integration with dry-run. - Tool
get_findings(min_confidence): load session, filter findings by confidence (see findings.FindingConfidence), return JSON. Test: fixture session, assert filtering. - Document MCP capability and IDE integration (Cursor, etc.).
- Config and env:
STET_STATE_DIRfor state directory when running as MCP process. No new config file keys required for minimal version. - Acceptance criteria: An MCP client can connect via stdio, request
stet://latest_reportand get valid JSON; callrun_review(scope)and receive review result; callget_findings(min_confidence)and receive filtered findings. - Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
- Status: Not started.
- Goal: Run a fast local linter on changed files and inject linter output into the review prompt so the LLM can explain and suggest fixes for syntax/static issues.
- Entry points: No new CLI command. Integrate into the review pipeline. Call sites: where the system or user prompt is built — cli/internal/review/review.go (ReviewHunk) and cli/internal/run/run.go (where hunks are iterated). Run linter per changed file (or per language) before or at start of review.
- Inputs: Changed files from diff (repo root + list of file paths). Config: linter command per language (e.g. Go →
golangci-lint, JS/TS →eslint), or empty to disable. Repo root from git.RepoRoot. - Outputs: Linter output (file, line, code, message) injected into the prompt as a "Static analyzer reported: …" block (user or system prompt). No change to findings.Finding schema unless desired.
- Code to reuse: Diff pipeline for changed files (e.g. cli/internal/diff); cli/internal/prompt — add
AppendLinterFindings(systemOrUserPrompt string, linterResults []LinterResult) stringor equivalent. review.ReviewHunk and run loop in run.Run to run linter and call the new append function. - Implementation chunks:
- Config for linter commands per language (e.g.
linter_go,linter_jsin config.Config or envSTET_LINTER_GO,STET_LINTER_JS). Load in config and pass to review path. - Run linter on changed files (by language from extension); parse stdout into (file, line, code, message). Prefer existing linter output formats (e.g. golangci-lint JSON, eslint JSON). Unit test: parse canned linter output.
- Append linter block to prompt in review path (e.g. new function in prompt package, called from review.go after system prompt load). Integration test: mock linter output, assert prompt contains the block.
- Document prompt shape and config keys.
- Config for linter commands per language (e.g.
- Config and env: e.g.
[linter]section or keyslinter_go,linter_jsin config.Config; or envSTET_LINTER_GO,STET_LINTER_JS. Empty or missing = disabled. - Acceptance criteria: With linter enabled and a changed file that has a linter error, the review prompt includes a "Static analyzer reported: …" section and the model can reference it in findings or explanations.
- Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
- Status: Not started.
- Goal: Use active findings (after user dismisses false positives) as input to a smaller local LLM to generate targeted code fixes; optionally apply and commit, or loop (refine) until review is clean.
- Entry points: New commands
stet fixandstet refine. Register in cli/cmd/stet/main.go withrootCmd.AddCommand(newFixCmd())androotCmd.AddCommand(newRefineCmd())(around line 170). - Inputs: Session from state dir (active findings = session.Session Findings minus DismissedIDs). Optional
--finding-id IDto limit to one finding. Flags:--dry-run,--apply,--commit,--max-iterations N(refine),--message "..."(commit message). Repo root from cwd. Config: fix_model, fix_temperature (see Config and env). - Outputs: Without
--apply/--commit: proposed patches (unified diff or code blocks) to stdout. With--apply: apply patches to working tree. With--commit: after apply, create one commit per fix (or per refine iteration) with attribution line in body:written by <MODEL> after review from stet. Session and refs/notes/stet unchanged by fix; stet finish continues to write notes as today. - Code to reuse: session.Load(stateDir); findings.Finding (File, Line, Range, Message, Suggestion, Category). expand.ExpandHunk for hunk-based expansion; add
ExpandAtLocation(repoRoot, filePath, startLine, endLine, maxTokens int) (string, error)in cli/internal/expand for finding-based context (enclosing function for Go, else N-line window). ollama.ClientGeneratewith fix model. Token estimation in cli/internal/tokens or equivalent. Resolve finding by prefix: findings.ResolveFindingIDByPrefix. - Implementation chunks:
- Add
expand.ExpandAtLocation(repoRoot, filePath, startLine, endLine, maxTokens)with same semantics as roadmap table (enclosing function for Go; ±50/30/20/10 lines fallback; minimal ±2–5). Unit tests with fixture files. - Fix pipeline (new package or under run): load session, filter to active findings or single finding by ID; for each finding compute token budget, extract context (ExpandAtLocation or N-line window), build fix prompt (system: "You are a code fix assistant…"; user: file, line/range, message, suggestion, category, code context); call Ollama with fix model; parse response (code block or unified diff). Unit tests with mock Ollama.
stet fixCLI: flags, call fix pipeline, print patches; with--applyapply to worktree; with--commitrun git commit with attribution line. Integration tests (e.g. dry-run, apply in temp repo).stet refineCLI: loop — fix → apply & commit → run review (invoke run package); repeat until no active findings or--max-iterations. Default--max-iterations 3. Tests with dry-run and mock.- Config and env: add
fix_modelandfix_temperatureto config.Config (see Config and env below for defaults). - Optional: when git-ai CLI is available and repo uses it, record fix in
refs/notes/aiper Git AI Standard v3.0.0 (agent tool=stet, model=fix_model, session id e.g. stet-{session_id}-{finding_short_id}).
- Add
- Context extraction (reference): Largest-first, fallback-to-smaller. Chunk levels: Enclosing function (Go via expand); ±50 lines; ±30/20/10; minimal ±2–5. Algorithm: token budget = contextLimit - systemPrompt - findingPayload - responseReserve; try largest chunk; if over budget, fall back to smaller N-line windows; last resort minimal range.
- Commit message and attribution: Commit body must include line:
written by <MODEL> after review from stet. User may add subject/body via--message. Refine: one commit per iteration; cap with--max-iterations; "passes" means stet review reports no active findings (stet does not run tests/linters/CI). - Prompt design (reference): System: "You are a code fix assistant. Given a code snippet and a code review finding, output ONLY the corrected code. Do not explain." User: file path, line/range, message, suggestion (if any), category, code context. Output format: code block or unified diff for parsing.
- Config and env:
fix_model(defaultqwen2.5-coder:7b),fix_temperature(default 0.1). Env:STET_FIX_MODEL,STET_FIX_TEMPERATURE. Add to config.Config and Overrides. - Acceptance criteria:
stet fixwithout--apply/--commitprints patches to stdout. With--applypatches are applied to worktree. With--commitone commit is created with attribution line.stet refineruns fix → commit → review until no active findings or max-iterations; each iteration produces one commit with attribution. - Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
Goal: Reduce "False Positive Fatigue" by learning from dismissals and team rules.
- Status: Done.
- Goal: If the user repeatedly dismisses similar feedback, stet should reduce or stop offering similar findings (via prompt injection or post-process filter).
- Entry points: No new command. Integrate into the review path. Call sites: when building the system prompt or when post-processing findings — cli/internal/review/review.go and/or cli/internal/run/run.go. Prefer Option A (prompt injection) first; Option B (post-process similarity) can follow as an optional chunk.
- Inputs: Last N dismissed findings from
.review/history.jsonl. Schema: history.Record with UserAction.Dismissals and ReviewOutput; see history.ReadRecords (or equivalent) for reading in chronological order. Config: enable/disable, N (e.g. 50). - Outputs: Fewer findings shown to the user: either the system prompt includes "Do not report issues similar to: [examples]" (Option A) or new findings are filtered by similarity to dismissed (Option B). Config flag to disable suppression.
- Code to reuse: history — ReadRecords / append order; history.Dismissal and Finding message for building examples. prompt — add
AppendSuppressionExamples(systemPrompt string, examples []string) string. Wire into review.go after SystemPrompt and before or after other append steps. - Implementation chunks:
- Read last N history records (e.g. 50) from state dir; extract Dismissals and corresponding finding message (and file:line) for each. Unit test: fixture history.jsonl, assert extracted examples.
- Build short "example" text per dismissal (e.g. message + file:line). Deduplicate or limit total length. Add
AppendSuppressionExamples(systemPrompt, examples)in prompt package; fixed section header e.g. "Do not report issues similar to:". - In review path (review.go and/or run.go), call AppendSuppressionExamples when suppression enabled and examples non-empty. Config flag: e.g.
suppression_enabled(default true or false),suppression_history_count(default 50). - Unit and integration tests: with suppression on and fixture history, assert system prompt contains the section; optionally assert fewer findings in dry-run. Optional later chunk: vector store and post-process similarity filter (similarity > threshold → suppress).
- Config and env: e.g.
suppression_enabled(bool),suppression_history_count(int, default 50) in config.Config; envSTET_SUPPRESSION_ENABLED,STET_SUPPRESSION_HISTORY_COUNT. - Acceptance criteria: After dismissing several findings and running review again, the system prompt includes "Do not report issues similar to" with recent examples (when enabled). With suppression disabled, behavior unchanged. No new CLI commands.
- Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
- Status: Not started.
- Goal: Allow teams to enforce natural-language rules (e.g. naming, no fmt.Printf in production) by injecting them into the system prompt as high-priority constraints.
- Entry points: No new command. Load file when building the system prompt. Call site: same chain as Cursor rules — cli/internal/review/review.go loads system prompt then calls AppendCursorRules; add a step to load and append rulebook (e.g. before or after Cursor rules). Same integration point in cli/internal/run/run.go where system prompt is built.
- Inputs: File at repo root
.stet/rules.md(or configurable path via config). Encoding: UTF-8. If file is missing or unreadable, skip (no error). Reasonable size limit (e.g. 64 KiB) to avoid huge prompts. - Outputs: Rules content appended to the system prompt under a fixed section header (e.g. "## High Priority Constraints"). No change to findings schema.
- Code to reuse: prompt.SystemPrompt, prompt.AppendCursorRules. Review chain in review.ReviewHunk (SystemPrompt → InjectUserIntent → AppendCursorRules → …). Add
AppendRulebook(systemPrompt, rulebookPath string) stringin prompt package; if rulebookPath is non-empty and file exists, read and append with header. Repo root from git.RepoRoot. - Implementation chunks:
- Resolve path: repo root +
.stet/rules.mdor configrules_fileif present. If config, add optionalrules_fileto config.Config. - Read file; validate (exists, size within limit). Return empty string if missing. Add
AppendRulebook(systemPrompt, rulebookPath string) stringin cli/internal/prompt. - Wire in review.go and run.go: after SystemPrompt (and optionally after InjectUserIntent), call AppendRulebook with resolved path.
- Document format (Markdown) and precedence vs Cursor rules (e.g. rulebook = global team; Cursor rules = file-glob specific).
- Unit tests: AppendRulebook with missing file, empty file, and valid content; integration test: run with fixture .stet/rules.md, assert prompt contains section.
- Resolve path: repo root +
- Config and env: Optional
rules_filein config (path relative to repo or absolute); if unset, use repo root.stet/rules.md. - Acceptance criteria: When
.stet/rules.mdexists at repo root, the system prompt includes "## High Priority Constraints" and the file contents. When file is missing, no error and no section added. - Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
- Status: Not started.
- Goal: Use dismissal (and optionally acceptance) history to suggest configuration changes (RAG symbol limits, strictness) that correlate with better acceptance or lower false-positive dismissal; suggest-only, no auto-apply.
- Entry points: Extend
stet optimizeor add a new command (e.g.stet suggest-config). Current optimizer: cli/cmd/stet/main.gorunOptimizeinvokes an external script that writessystem_prompt_optimized.txt. Either extend the script contract to also write suggested config, or add a Go path that reads history and writes suggested config snippet (e.g..review/suggested_config.tomlor stdout). - Inputs: .review/history.jsonl with Record (ReviewOutput, UserAction.Dismissals, RunConfig). RunConfig (RunConfigSnapshot) already has Strictness, RAGSymbolMaxDefinitions, RAGSymbolMaxTokens. Ensure records are tagged with run config when available (append path may already support RunConfig).
- Outputs: Suggested config snippet: e.g. "suggested rag_symbol_max_definitions: 8", "suggested strictness: lenient", or a file
.review/suggested_config.toml(or printed to stdout). No auto-apply; user merges manually. - Code to reuse: history.ReadRecords for chronological read; Record.RunConfig, Record.UserAction.Dismissals. Aggregate by config buckets; compute dismissal rate (e.g. false_positive / total) and acceptance rate; suggest RAG/strictness values that correlate with higher acceptance or lower FP rate. config key names for output.
- Implementation chunks:
- Confirm history schema and append path populate RunConfig when available; extend if needed. No new CLI flags required for this chunk.
- Analyzer: read history, group by RunConfig (or bins), compute per-group dismissal rate and acceptance rate; choose suggested values (e.g. config with best acceptance in last N records). Scope: RAG symbol options and strictness only.
- Output: write
.review/suggested_config.tomlor print to stdout (e.g.stet suggest-config). Document format. - If extending
stet optimize: document that script may write both system_prompt_optimized.txt and suggested_config.toml; CLI does not need to read suggested_config unless adding a merge command later. - Unit tests: fixture history with varied RunConfig and dismissals, assert suggested values. Integration test optional.
- Config and env: No new config keys; output is suggested config for user to merge.
- Acceptance criteria: After enough history with varied run config, running the suggest path produces suggested rag_symbol_max_definitions, rag_symbol_max_tokens, and/or strictness. User can copy into .review/config.toml. No automatic application.
- Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
Goal: Detect when a change in one file breaks logic in another file that wasn’t touched ("spooky action at a distance").
- Status: Not started.
- Goal: Emit findings when a change in File A (e.g. signature of a function) likely breaks File B that uses that symbol but was not updated (e.g. test file not in diff).
- Entry points: No new command. Integrate into the review pipeline as an extra pass or extra findings. Call site: after or alongside per-hunk review in cli/internal/run/run.go. For changed hunks, extract public symbols, search repo for usages in other files; if a referencing file is not in the diff, emit a finding.
- Inputs: Changed hunks from diff (from existing diff pipeline); repo root. Symbol extraction: public symbols (e.g. function names, type names) in changed hunks — use Tree-sitter or extend existing rag symbol layer. Reference search: grep/ripgrep or LSP to find usages of those symbols in other files.
- Outputs: Additional findings.Finding values (e.g. "You changed Login signature; auth_test.go is stale. This will likely break the build.") added to session or streamed with same schema. No new severity/category required; use existing.
- Code to reuse: diff for hunks; rag for symbol definitions (extend for "symbols defined in hunk" and "references in repo" if needed). findings.Finding for constructing new findings. Optionally Tree-sitter for parsing changed files.
- Implementation chunks:
- Symbol extraction from changed hunks (per language; start with Go). Produce list of (symbol name, file, line). Unit test: fixture hunk, assert extracted symbols.
- Reference search: for each symbol, find references in repo (e.g. grep/ripgrep by symbol name, or LSP). List (file, line) of references; exclude files that are in the diff.
- For each reference (file not in diff): generate one finding (e.g. "You changed X; uses X and was not updated. This will likely break the build."). Merge into review output (append to findings in run.go).
- Config to enable/disable (e.g.
cross_file_impact_enabled). Default off for initial rollout. - Unit and integration tests: fixture repo with changed function and untouched caller, assert finding produced when enabled.
- Config and env: e.g.
cross_file_impact_enabled(bool, default false) in config.Config; envSTET_CROSS_FILE_IMPACT_ENABLED. - Acceptance criteria: When enabled, changing a function signature (or renamed symbol) in a file and not updating another file that references it produces an actionable finding. When disabled, no cross-file findings.
- Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
- Status: Not started.
- Goal: Group findings that represent the same underlying issue (same file + similar message or category) so users see conceptual groups instead of many near-duplicate lines. Display-only; session unchanged.
- Entry points:
stet list --groupedand optionalstet list --grouped --verify. Extend newListCmd (list command, around line 913); add flags--groupedand--verify. - Inputs: Active findings from session (same as current
stet list): load session, compute active = Findings minus DismissedIDs. Use findings.Finding (File, Line, Range, Message, Category). With--verify, optional LLM call for borderline groups (narrow yes/no prompt). - Outputs: Display-only; session and findings unchanged. When
--grouped: print grouped format — for each group a line[Group: <canonical description>]then member lines (shortID, file:line, severity, message). Same info as list, reordered and grouped. With--verify, optionally merge groups that LLM says are same issue. - Code to reuse: hunkid.messageStem, hunkid.collapseWhitespace for message normalization. findings.ShortID. New package e.g.
cli/internal/consolidatefor grouping logic (pure functions: GroupFindings(findings) → groups). - Implementation chunks:
- Grouping logic in
cli/internal/consolidate: (a) Same file + message stem similarity (use messageStem/collapseWhitespace; Jaccard on word stems or Levenshtein below threshold). (b) Same file + shared category + overlapping keywords. (c) Same file + nearby lines (e.g. within 20) + similar message stem. Return list of groups, each with canonical description (e.g. first finding’s message or merged) and member findings. Unit tests with fixture findings. stet list --grouped: load active findings, call GroupFindings, print format below. No--verifyyet. Integration test: session with multiple similar findings, assert output format.- Optional
--verify: for borderline groups (e.g. similarity in range 0.6–0.85), call small LLM with prompt "Are these N findings the same underlying issue? Answer Yes or No." Merge groups when Yes. Gate behind--verify; default off. - Document output format and thresholds (config optional: e.g. similarity threshold).
- Grouping logic in
- Output format (reference):
[Group: Potential nil pointer dereference in StreamOut]
ca1b234 cli/cmd/stet/main.go:301 warning Potential nil pointer dereference in StreamOut assignment
3a0ed5c cli/cmd/stet/main.go:498 warning Potential nil pointer dereference in StreamOut assignment
[Group: Duplicate function definition for newRunCmd]
f0b4f0c cli/cmd/stet/main.go:356 error Duplicate function definition for newRunCmd
54e82de cli/cmd/stet/main.go:363 error Duplicate function definition for newRunCmd
- Config and env: Optional similarity threshold in config;
--verifyis opt-in flag. No required config for minimal version. - Acceptance criteria:
stet list --groupedprints findings grouped by same file + similar message/category; session and dismiss behavior unchanged.stet listwithout--groupedunchanged. With--verify, borderline groups may be merged after LLM confirmation. - Tests: New and changed code must meet project coverage: 77% project, 72% per file (see AGENTS.md).
| Topic | Goal | Complexity |
|---|---|---|
| AST-preserving minification (non-Go) | Go is implemented (in-repo, per-line whitespace reduction). Extend to JS/TS, Python, Java, C#, Swift, etc.: either in-repo via Go parsers (if available) or external tools (e.g. Node for JS/TS, Python for Python, Uncrustify for C-family). Same token-saving behavior as Go when reviewing those languages. | Medium–High |
| Search-replace diff format | --search-replace is an experimental flag to compare search-replace vs unified diff for token usage and finding quality. Once we have data, decide whether to make it default or remove. |
Low |
| Local vector stores | Evaluate sqlite-vss vs chromadb for storing dismissal history locally without heavy dependencies. | Medium |
| LSP integration | Use running Language Server (LSP) instead of or in addition to Tree-sitter where possible. | High |
| Review summarization | Generate a "PR Description" from findings (auto-draft PR). | Low |
| Documentation quality | Use commit (and future PR) description for intent context; document that clear author-side docs improve stet accuracy. | Low |
| Evaluation corpus | Fixed set of hunks (known-good / known-bad) to track precision/recall as prompts and optimizer change. | Medium |
Adoption: Pilot on one team or repo; collect dismiss reasons and run stet optimize periodically; document when to use default vs. strict vs. nitpicky so rollout aligns with feedback-driven improvement.