This document defines the output and exit-code contract between the Stet CLI and the Cursor extension (Phase 5). The extension spawns the CLI and parses its output.
stet start [ref]— On success, writes findings to stdout (format depends on--output).stet run— On success, writes findings to stdout (format depends on--output).- The
--dry-runflag skips the LLM and emits deterministic findings for CI. - The
--nitpickyflag enables convention- and typo-aware review: the system prompt is augmented to report style, typos, and grammar, and the FP kill list is not applied. Can be set in config (nitpicky = true) or env (STET_NITPICKY=1). When set onstet start, the value is persisted sostet runuses it unless overridden.
Output and progress:
- Default: Progress (worktree path, partition summary, per-hunk lines) is printed to stderr. Stdout is human-readable (one line per finding:
id file:line severity message, then a summary line). The id is abbreviated (e.g. first 7 characters) as instet list. - Machine output: Use
--output=jsonor--jsonfor machine-parseable JSON on stdout. When--jsonor--streamis used, progress on stderr is suppressed automatically (so--quietis optional). Use--quietexplicitly to suppress progress when using human-readable output. Example:stet start --dry-run --json(no need for--quiet). - Streaming: Use
--streamtogether with--output=jsonor--jsonto receive NDJSON events (one JSON object per line) so the extension can show progress and findings incrementally.--streamrequires JSON output; without--jsonthe CLI returns an error. Progress on stderr is suppressed when streaming.
With --output=json or --json (required for the extension and any script that parses findings):
Without --stream: On success (exit code 0), the CLI writes exactly one JSON object, followed by a newline:
{"findings": [ ... ]}findings: Array of finding objects. May be empty (e.g. nothing to review).- Each element has the following fields (see
cli/internal/findings/finding.gofor the canonical schema):id(string, optional): Stable identifier for the finding. In JSON output the id is always the full stable identifier. In human-readable output (e.g.stet list,stet status --ids) the CLI shows an abbreviated form (e.g. first 7 characters). Commands that take a finding id (e.g.stet dismiss) accept either the full id or a unique prefix of at least 4 characters.file(string): Relative file path.line(number, optional): Line number.range(object, optional):{"start": n, "end": m}for multi-line span.severity(string):"error","warning","info", or"nitpick".category(string): Canonical set for Defect-Focused pipeline and extension:"security","correctness","performance","maintainability","best_practice". Existing values ("bug","style","testing","documentation","design","accessibility") are retained for backward compatibility. The"accessibility"category covers UI/UX accessibility concerns (e.g. missing labels, contrast, keyboard navigation); use it when the finding relates to assistive-technology or inclusive-design requirements.confidence(number): Float 0.0–1.0; model’s certainty. CLI always emits this (default 1.0 when omitted from model output).message(string): Description of the finding.suggestion(string, optional): Suggested fix.cursor_uri(string, optional): Deep link (e.g.file://orcursor://). When the CLI sets it (when the model omits it), it usesfile://with absolute path and line (or range) so the extension can open at location.
With --stream (and --output=json/--json): On success, the CLI writes NDJSON to stdout: one JSON object per line. Each object has a type field. No final {"findings": [...]} is written when streaming.
type |
Other fields | Description |
|---|---|---|
progress |
msg (string) |
Progress message (e.g. "N hunks to review", "Reviewing hunk 1/3: path"). |
finding |
data (object) |
One finding; same shape as an element of the findings array above. |
done |
(none) | End of stream; no more events. |
Example stream (abbreviated):
{"type":"progress","msg":"2 hunks to review"}
{"type":"progress","msg":"Reviewing hunk 1/2: cli/main.go"}
{"type":"finding","data":{"id":"...","file":"cli/main.go","line":10,"severity":"info","category":"maintainability","confidence":1.0,"message":"..."}}
{"type":"progress","msg":"Reviewing hunk 2/2: pkg/foo.go"}
{"type":"done"}Without --output=json, stdout is human-readable (one line per finding: id file:line severity message, then a summary); format may change. Do not parse it programmatically.
Human-readable error and diagnostic messages. When not using --quiet, progress (worktree path, hunk counts, per-hunk "Reviewing hunk N/M") is also written to stderr. The extension should surface these when the process exits with a non-zero code. For some conditions (e.g. uncommitted changes, or an existing review session), the CLI also prints a one-line recovery hint (e.g. Hint: Run 'stet finish'...) so the extension or user can suggest the next command.
On stet start failure, the CLI may print one of the following hints to stderr before the error line:
| Condition | Hint |
|---|---|
| Uncommitted changes | Hint: Commit or stash your changes, then run 'stet start' again. |
| Worktree already exists | Hint: Run 'stet finish' to end the current review and remove the worktree, then run 'stet start' again. |
| Code | Meaning |
|---|---|
| 0 | Success; with --output=json/--json, stdout contains the findings JSON (or NDJSON when --stream). |
| 1 | Usage error or other failure (e.g. not a git repo, no session, model not found). |
| 2 | LLM unreachable (configured backend not running or not reachable) or LLM bad request (e.g. HTTP 4xx from the server). Applies to both Ollama and OpenAI-compat providers. |
stet status— Reports baseline, last_reviewed_at, worktree path, finding count, and dismissed count. When the session has them (set atstet start), also reports strictness, rag_symbol_max_definitions, and rag_symbol_max_tokens. Exits 1 with "No active session" if no session. Use--idsor-ito list active finding IDs (ID, file:line, severity, message) for use withstet dismiss.stet list— Lists active findings with IDs (same format asstatus --ids). Exits 1 if no active session. Use to copy IDs forstet dismiss.stet dismiss <id> [reason]— Adds the finding ID to the session’s dismissed list so it does not resurface in findings output. Optional reason (one offalse_positive,already_correct,wrong_suggestion,out_of_scope) is recorded for the optimizer. For when to use each reason, see review-quality.md. Idempotent. Exits 1 if no active session; exits 1 if reason is provided and invalid. Findings can also be auto-dismissed when a re-review of the same code (e.g. after the user fixes issues) no longer reports them, so the list shrinks as issues are fixed.stet finish— Ends the session and removes the worktree. Exits 1 if no active session.stet cleanup— Removes orphan stet worktrees (worktrees namedstet-*that are not the current session’s worktree). Optional; exits 0 when there are no orphans. Exits 1 on error (e.g. not a git repo orgit worktree removefailure).
The optional stet optimize command invokes an external script (e.g. a Python DSPy optimizer) to improve the system prompt from user feedback. The Go CLI has no Python or DSPy dependency; it only runs the configured command. To run the optimizer you need a Python environment with DSPy (or whatever your script requires); the CLI does not install or depend on Python.
- When to run: e.g. weekly or after enough feedback has been collected in
.review/history.jsonl. - Input: The script reads
.review/history.jsonl(see State storage and history below). The CLI passes the state directory via theSTET_STATE_DIRenvironment variable when invoking the script. - Output: The script should write
.review/system_prompt_optimized.txt. When that file exists, the CLI uses it as the system prompt for review (see Phase 3.3). Optimized prompts must request the same JSON finding shape (file, line, range, severity, category, confidence, message, suggestion, cursor_uri, and optional evidence_lines) so the CLI parser and validators continue to work. - Configuration: Set the command to run via
STET_OPTIMIZER_SCRIPToroptimizer_scriptin repo/global config (e.g.python3 scripts/optimize.pyor a path to your script). If unset,stet optimizeexits 1 with a message asking you to configure it. - Exit codes: 0 = success; non-zero = failure (script missing, Python/DSPy error, invalid history, etc.). The CLI propagates the script’s exit code when in 0–255.
For optimizing toward actionable findings, see Review quality and actionability and docs/review-quality.md.
Precedence: CLI flags > environment variables > repo config (.review/config.toml) > global config (~/.config/stet/config.toml or XDG equivalent) > defaults. Canonical defaults and types: cli/internal/config/config.go.
| Key / env | Default | Description |
|---|---|---|
provider / STET_PROVIDER |
ollama |
LLM backend: ollama or openai (OpenAI-compatible HTTP API, e.g. LM Studio local server). |
model / STET_MODEL |
qwen3-coder:30b |
Model name for the configured backend (Ollama tag or id your OpenAI-compat server expects). |
ollama_base_url / STET_OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API base URL. Used when provider is ollama. |
openai_base_url / STET_OPENAI_BASE_URL |
http://localhost:1234/v1 |
OpenAI-compatible API base URL (include /v1 if your server uses that path). Used when provider is openai. |
max_completion_tokens / STET_MAX_COMPLETION_TOKENS |
4096 | OpenAI-compat only: maps to request max_tokens (completion/output cap). Not derived from num_ctx or context window. Ollama ignores this field. |
context_limit / STET_CONTEXT_LIMIT |
32768 | Token context limit for prompts. |
warn_threshold / STET_WARN_THRESHOLD |
0.9 | Warn when estimated tokens exceed this fraction of context limit. |
timeout / STET_TIMEOUT |
15m | Per-request timeout for LLM HTTP requests (Go duration or integer seconds). Use --timeout on stet start, stet run, or stet rerun to override. |
state_dir / STET_STATE_DIR |
(empty → .review in repo) |
Directory for session, lock, history, optimized prompt. |
worktree_root / STET_WORKTREE_ROOT |
(empty → repo/.review/worktrees) |
Directory for stet worktrees. |
temperature / STET_TEMPERATURE |
0.2 | Sampling temperature (0–2). Passed to the configured backend (Ollama generate options; OpenAI-compat where supported). |
num_ctx / STET_NUM_CTX |
32768 | Model context window size (tokens) for sizing prompts and warnings. For Ollama, also passed to /api/generate (0 = use model default). Stet does not bump context from the server; effective values come from config/env/flags/session. For OpenAI-compat, output length is capped by max_completion_tokens, not by num_ctx. |
optimizer_script / STET_OPTIMIZER_SCRIPT |
(none) | Command for stet optimize (e.g. python3 scripts/optimize.py). |
rag_symbol_max_definitions / STET_RAG_SYMBOL_MAX_DEFINITIONS |
10 | Max symbol definitions to inject (0 = disable). |
rag_symbol_max_tokens / STET_RAG_SYMBOL_MAX_TOKENS |
0 | Max tokens for symbol-definitions block (0 = no cap). |
strictness / STET_STRICTNESS |
default |
Review strictness preset: strict, default, lenient, or strict+, default+, lenient+. Controls confidence thresholds (strict = 0.6/0.7, default = 0.8/0.9, lenient = 0.9/0.95) and whether the false-positive kill list is applied. The "+" presets use the same thresholds but do not apply the FP kill list (more findings shown). |
The + presets (strict+, default+, lenient+) show more findings by not filtering messages that match the built-in FP kill list.
RAG symbol options can also be set via --rag-symbol-max-definitions and --rag-symbol-max-tokens on stet start and stet run; when set, they override config and env. Strictness can also be set via --strictness on stet start and stet run; when set, it overrides config and env.
Strictness and RAG symbol options set on stet start are stored in the session. stet run uses those stored values when the corresponding flag is not set. Explicit flags on stet run override for that run only; the next run without flags again uses the session values from start.
Context window can be set via --context (preset: 4k, 8k, 16k, 32k, 64k, 128k, 256k) or --num-ctx (exact tokens). Both set context_limit and num_ctx; if both flags are given, --num-ctx wins. Values set on stet start are stored in the session and used by stet run until stet finish. stet commitmsg also accepts --context and --num-ctx (for the message suggestion and for the review when --commit-and-review is used).
Using --context 256k or 128k requires sufficient RAM/VRAM and can lead to long-running requests. If the review fails with a timeout (e.g. "context deadline exceeded"), try:
- A smaller context:
--context 32k(or--num-ctx 32768). - A higher per-request timeout: set
STET_TIMEOUT(e.g.export STET_TIMEOUT=45m) ortimeoutin.review/config.toml(see Config schema), or use--timeout 45monstet start,stet run, orstet rerun.
The CLI shows a hint when a request times out; follow it to adjust timeout or context.
For each hunk, stet can look up symbols referenced in the hunk (functions, types, etc.) and inject their definitions (signature + optional docstring) into the prompt as a "## Symbol definitions" block. This gives the model cross-file context and is implemented per language (Go, TypeScript, Python, Swift, Java) in cli/internal/rag/rag.go. The pipeline step is described in review-process-internals.md §7.7.
What each setting does when modified:
rag_symbol_max_definitions(and--rag-symbol-max-definitions): Max number of symbol definitions injected per hunk. 0 = RAG disabled (no lookup, smaller/faster prompts). Increase (e.g. 15–20) = more context, larger prompts.rag_symbol_max_tokens(and--rag-symbol-max-tokens): If > 0, the entire symbol-definitions block is truncated to this many tokens before appending to the prompt (prompt.AppendSymbolDefinitions); definitions are first limited by count, then the combined text is capped by this. 0 = no cap (only overall context limit applies).
When and how to tune:
- Disable RAG: Set
rag_symbol_max_definitionsto 0 (or--rag-symbol-max-definitions=0) for faster runs, smaller prompts, or to compare impact (e.g. efficacy-tests.md A10: RAG Ablation). - Increase definitions: If reviews miss cross-file or "what does this symbol do?" context, try raising
rag_symbol_max_definitions(e.g. 15–20); watch prompt size and context warnings. - Cap symbol block size: If you hit context-limit or
warn_thresholdissues, setrag_symbol_max_tokensto a value (e.g. 500–2000) so the symbol block is bounded and the rest of the prompt (hunk, rules, etc.) fits.
Per-hunk adaptive (planned): A future release may compute the RAG token cap per hunk from the effective context limit minus base prompt size and response reserve, so each hunk gets as much symbol context as fits. When implemented, config rag_symbol_max_tokens and rag_symbol_max_definitions will act as upper bounds or explicit overrides when set; when unset (or 0 for tokens), the per-hunk budget is used. See implementation-plan.md Phase 6.11.
Context limit and num_ctx come from config, environment, --context / --num-ctx, and session persistence. Token warnings and RAG budgeting use the configured context limit only (they are not bumped from Ollama /api/show; see cli/internal/run/run.go). On stet run and stet rerun, session values from stet start are used when those flags are not set. With provider = openai, completion output is capped by max_completion_tokens (OpenAI max_tokens), independent of num_ctx / --context.
The CLI must be run from the repository root (or from a directory under the repo) so that git rev-parse --show-toplevel succeeds. Invoke from repo root (e.g. cd /path/to/repo && stet start --dry-run).
If multiple stet worktrees remain after interrupted runs (e.g. git worktree list shows entries under .review/worktrees/stet-*), run stet cleanup to remove orphan stet worktrees. Alternatively, run stet finish to remove the current session’s worktree, then remove any remaining paths with git worktree remove <path>.
State lives under .review/ (or the path given by state_dir). Artifacts:
session.json— Session state (baseline ref, last_reviewed_at, findings, dismissed_ids, prompt_shadows, and optionally strictness, RAG symbol options, and context_limit/num_ctx fromstet start).lock— Advisory lock for a single active session.config.toml— Repo-level config (optional).history.jsonl— Active feedback log for the optimizer and prompt shadowing (see below). Rotated-out lines may be written tohistory.jsonl.<n>.gz(e.g.history.jsonl.1.gz,history.jsonl.2.gz); at most 5 archives are kept.system_prompt_optimized.txt— Written bystet optimize; used as system prompt when present.worktrees/— Directory for stet worktrees (defaultrepo/.review/worktreesorworktree_root). Each entry isstet-<short-sha>.
The .review/ directory is in .gitignore by default so state does not pollute version control; it can be removed from .gitignore if the team wants to commit state.
Session state (.review/session.json) includes prompt_shadows: on dismiss, the CLI stores { "finding_id": "...", "prompt_context": "..." } for each dismissed finding so it can be used as a negative few-shot in future prompts. The internal finding_prompt_context map (finding ID → hunk content) is populated during review and used when the user dismisses to record the code context that produced the finding.
The CLI appends to .review/history.jsonl on user feedback (on dismiss via stet dismiss, on auto-dismiss when re-review no longer reports a finding at that location, and on finish when there are findings). History layout: The active file is history.jsonl. When rotation runs (after the file exceeds the record cap), dropped lines are written to a gzipped archive history.jsonl.<n>.gz (e.g. history.jsonl.1.gz, history.jsonl.2.gz); at most 5 such archives are kept. Scripts that read full history (e.g. the optimizer or stet stats) should read sorted archives first (by numeric <n>, ascending), then the active history.jsonl, so that the combined stream is chronological (oldest first). The CLI provides history.ReadRecords(stateDir) for this order.
Each line is one JSON object with:
diff_ref: Ref or SHA for the reviewed scope (e.g. the HEAD at last review run, i.e.last_reviewed_at).review_output: Array of finding objects (same shape as stdout findings).user_action: Object with:dismissed_ids(array of strings): Finding IDs the user dismissed.dismissals(optional): Array of{ "finding_id": "...", "reason": "...", "prompt_context": "..." }for per-finding reasons.reasonvalues:false_positive,already_correct,wrong_suggestion,out_of_scope.prompt_context(optional): the hunk/code that produced the finding; set when the user supplies a reason and context exists—i.e. the CLI has the hunk content in session state from the review run (thefinding_prompt_contextmap). Omitted if the user did not run a review in this session or the finding was added by other means.finished_at(optional): When the session was finished (e.g. ISO8601).
run_config(optional): Snapshot of run config for tuning correlation:model,strictness,rag_symbol_max_definitions,rag_symbol_max_tokens,nitpicky.prompt_tokens,completion_tokens,eval_duration_ns(optional): Token and duration for the run that produced the findings; set on finish records whenSTET_CAPTURE_USAGEis enabled (default). Omitted when not captured.
Rotation keeps the last N records (default 1000) in the active file to avoid unbounded growth. The schema is suitable for future export/upload for org-wide aggregation. Canonical types: cli/internal/history/schema.go.
On stet finish, the CLI writes a Git note to refs/notes/stet at the commit that is current HEAD. The environment variable STET_CAPTURE_USAGE (default true) controls whether usage fields (model, prompt_tokens, completion_tokens, eval_duration_ns) are captured and written to the note; when set to false, those fields are omitted. The note body is a single JSON object with:
| Field | Type | Description |
|---|---|---|
session_id |
string | Unique id for the review session (from session or generated on finish). |
baseline_sha |
string | Full SHA of the baseline ref. |
head_sha |
string | Full SHA of HEAD at finish time. |
findings_count |
number | Number of findings in the session. |
dismissals_count |
number | Number of dismissed finding IDs. |
tool_version |
string | Stet CLI version (e.g. dev or set at build via -ldflags). |
finished_at |
string | When the session finished (RFC3339 UTC). |
hunks_reviewed |
number | (optional) Number of hunks reviewed. Zero when not available. |
lines_added |
number | (optional) Lines added in reviewed diff. Zero when not available. |
lines_removed |
number | (optional) Lines removed in reviewed diff. Zero when not available. |
chars_added |
number | (optional) Characters added. Zero when not available. |
chars_deleted |
number | (optional) Characters deleted. Zero when not available. |
chars_reviewed |
number | (optional) Total characters in hunks reviewed. Zero when not available. |
model |
string | (optional) Model name used. Omitted when not available. |
prompt_tokens |
number | (optional) Prompt token count. Omitted when STET_CAPTURE_USAGE=false. |
completion_tokens |
number | (optional) Completion token count. Omitted when STET_CAPTURE_USAGE=false. |
eval_duration_ns |
number | (optional) Evaluation duration in nanoseconds. Omitted when STET_CAPTURE_USAGE=false. |
You can push or fetch this ref (e.g. git push origin refs/notes/stet, git fetch origin refs/notes/stet:refs/notes/stet) for integration with git-ai or impact analytics. If you run stet finish again at the same HEAD, the existing note is overwritten.
Use stet stats volume to report review volume over a ref range. It reads refs/notes/stet and aggregates scope (hunks, lines, chars) and session count. Example: stet stats volume --since=main --until=HEAD (defaults) or stet stats volume --since="30 days ago" --format=json. Flags: --since, --until, --format=human|json. See implementation plan Phase 9.
Git AI integration: When refs/notes/ai exists (the repo uses git-ai), stet stats volume includes an optional git_ai object in JSON output with commits_with_ai_note and total_ai_authored_lines. Human output adds a "Git AI (refs/notes/ai)" section. The feature is auto-detected; no flag. Parsing follows Git AI Standard v3.0.0.
Use stet stats quality to report review quality from .review/history.jsonl. It aggregates total findings, total dismissed, and per-reason breakdown, and outputs: dismissal rate, acceptance rate, false positive rate, actionability, clean commit rate, finding density (when token data is available), and category breakdown. Example: stet stats quality or stet stats quality --format=json. Metric definitions are in the implementation plan Phase 9 appendix ("Impact reporting metric definitions").
Use stet stats energy to report local energy (kWh) and cloud cost avoided ($) from refs/notes/stet. It aggregates eval_duration_ns, prompt_tokens, and completion_tokens. Flags: --watts=30 (assumed power draw in watts for local kWh calculation), --cloud-model=NAME (preset: claude-sonnet, gpt-4o-mini) or --cloud-model=NAME:in_per_million:out_per_million (custom), --since, --until, --format. Example: stet stats energy --cloud-model=gpt-4o-mini or stet stats energy --cloud-model=my-model:1:2 --format=json. Caveats: estimates only; model equivalence heuristic; local energy estimate excludes electricity cost.
A finding is actionable when the reported issue is real (not already fixed or by design), the suggestion is correct and safe, and the change is within project scope. The default system prompt instructs the model to report only actionable issues and to prefer fewer, high-confidence findings. For the full definition, prompt guidelines, and optional lessons (e.g. common false positives), see docs/review-quality.md.
For pipelines or multiple commands (e.g. stet doctor ; stet start), STET_PROVIDER, STET_OLLAMA_BASE_URL, STET_OPENAI_BASE_URL, STET_TEMPERATURE, STET_NUM_CTX, STET_MAX_COMPLETION_TOKENS, and other STET_* variables must be exported (or set in the shell before both commands) so every stet invocation sees the same config. Command-prefixed env (e.g. VAR=value cmd1 ; cmd2) only applies to the first command; the second process will not see that variable and may fall back to defaults (e.g. http://localhost:11434 for Ollama), which can cause exit code 2 (LLM unreachable) even when the first command succeeded.
- Spawn the CLI with
--quiet --json(or--quiet --output=json). For incremental panel updates, add--streamso stdout is NDJSON (one event per line). Example:stet start --dry-run --quiet --json --streamorstet run --quiet --json --stream, from the repository root. - On exit code 0: if not streaming, read stdout and parse the single JSON object; use
findingsto populate the panel. If streaming, read stdout line-by-line; for each line parse the JSON object, and ontype: "progress"show progress, ontype: "finding"append thedatafinding and refresh the panel, ontype: "done"stop scanning. - On non-zero exit: read stderr for the error message; use exit code 2 to show a specific “LLM unreachable” or “bad request” message if desired (same code for Ollama and OpenAI-compat).