Skip to content

v12-v13: session-mined repos + adaptive hint format#68

Merged
dpetrou-continua merged 4 commits into
mainfrom
dpetrou/v12-terse-hints
Feb 22, 2026
Merged

v12-v13: session-mined repos + adaptive hint format#68
dpetrou-continua merged 4 commits into
mainfrom
dpetrou/v12-terse-hints

Conversation

@dpetrou-continua
Copy link
Copy Markdown
Contributor

Summary

Session mining + v12/v13 hint formats + 2 new benchmark repos.

What's new

Session mining — Analyzed 300 real Pi sessions (~95K tool calls, ~2,275 categorized errors) to identify the top 14 recurring agent sad paths. Top patterns: format-before-lint (533x), build target syntax (368x), hallucinated tool names (92x).

2 new repos from real sad paths:

  • monobuild — format-before-lint trap + build target syntax trap
  • toolhub — hallucinated tool name trap + missing module workaround

3 hint format variants tested:

  • v11 verbose (best general): ledgerkit −9%, logparse −20% (median)
  • v12 terse: webutil −2% (best), but ledgerkit +19% regression
  • v13 adaptive (terse for simple, verbose for discovery): middle-of-road

Testing

  • 27 test files, 158 tests pass
  • ~200 benchmark runs across v12/v13 + new repos
  • All 10 repos build and run correctly

HAPPY_PATHS_HINT_FORMAT=terse strips the explanation paragraph and
emits only: 💡 Try: <fix command>

Hypothesis: less text for the model to parse = faster action.
New repos mined from 300 real Pi sessions (~2275 errors):
- monobuild: format-before-lint (533x) + build target syntax (368x)
- toolhub: hallucinated tools (92x) + missing modules (88x)

v13 adaptive: terse for specific commands (pip install X),
verbose for discovery tasks (find setup scripts).

Now 10 repos, 40 tasks, 19 unique traps.
…tasks

13 benchmark iterations, ~600+ runs total.
v11 remains best general policy (-9% ledgerkit, -20% logparse median).
Session mining: 14 sad path families from 300 real sessions.
2 new repos from top unmapped sad paths (monobuild, toolhub).
@dpetrou-continua dpetrou-continua merged commit 8b834a3 into main Feb 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant