Fork of pi (the earendil-works agent harness) applying LemmaScript to its context-compaction cut-point selector — verified with both backends: Dafny, and Lean 4 (Velvet/Loom), from the same //@-annotated TypeScript. Annotations are added in-place — everything goes through //@ comments; the one exception is a behavior-identical control-flow edit in truncateTail (the byte-limit exit moved from a break into the loop guard, validated by a 117,900-case differential fuzz against the original). View as diff.
When the context window fills, pi compacts history: it picks a cut point and discards everything before it. A provider API rejects a message list whose retained prefix contains an orphaned toolResult — a tool result with no preceding tool call still in context. So the cut must never land such that the kept suffix starts with an orphaned tool result, and never splits a tool-use/tool-result run. We prove exactly those properties, in place, for both functions of the selector in packages/agent/src/harness/compaction/compaction.ts, and in the shipped CLI's own copy packages/coding-agent/src/core/compaction/compaction.ts.
check.sh dafny → 8 verified, 0 errors per compaction file. check.sh lean → lake build green, zero sorry: the headline no-orphan theorem, the turn-split property, the changelog semver core, and both tool-output truncators also carry machine-checked Lean proofs.
| File | Properties | Dafny | Lean |
|---|---|---|---|
agent/.../compaction.ts |
no-orphan cut, in-range, turn-split | ✓ | ✓ |
coding-agent/.../core/compaction/compaction.ts |
same (CLI's own copy) | ✓ | ✓ (//@ lean-module) |
coding-agent/.../utils/changelog.ts |
semver ordering trichotomy | ✓ | ✓ |
coding-agent/.../core/tools/truncate.ts |
line/byte-limit bounds for head & tail truncation | ✓ | ✓ |
coding-agent/.../core/tools/edit-diff.ts, ai/.../models.ts |
(see file annotations) | ✓ | — (//@ backend dafny) |
findValidCutPoints — compaction.ts:265
Scans entries[startIndex..endIndex) and returns the indices that are safe places to start a retained suffix. Two //@ ensures:
- In range. Every returned index is in
[startIndex, endIndex). - No orphan at the cut. No returned index points at a
messagewhose role istoolResult— the retained suffix can never begin with an orphaned tool result.
The second is the one with teeth: the source switch uses C-style fall-through (six roles share one push), and only toolResult is excluded. Two loop //@ invariants carry both properties.
findCutPoint — compaction.ts:341
Walks back from the end accumulating an (opaque) token estimate until it reaches keepRecentTokens, snaps to the nearest valid cut point, then snaps backward over metadata entries. Three //@ ensures over the chosen firstKeptEntryIndex:
- The snap can't undo safety.
firstKeptEntryIndexis in[startIndex, endIndex)and is not atoolResultmessage — even after the backward snap moves the cut earlier past metadata. (The subtle part: proving the snap breaks on any message, so it can't slide onto a tool result.) - No retained
toolResultis orphaned. EverytoolResultin the kept suffix has its preceding tool-use turn retained too — the cut never splits a tool-use/tool-result run. This holds because the cut lands on a non-toolResult, so it can never fall inside a run. - A split turn names a real boundary. When the cut falls mid-turn (
isSplitTurn), the reportedturnStartIndexis an in-range turn boundary at or before the cut — auser/bashExecutionmessage or a branch-summary/custom-message entry. Proven via the (verified, non-opaque)findTurnStartIndex.
Both carry an honest fallback caveat: when there are no valid cut points at all, the function returns startIndex unchanged, and the ensures disjoins on || firstKeptEntryIndex === startIndex rather than pretending the degenerate fallback is safe.
truncateHead / truncateTail — truncate.ts
The shared tool-output truncators (line limit and byte limit, whichever hits first). Proven on both backends: the output never exceeds the line budget or the total line count, and when truncation was byte-driven (and no partial line was taken) the output is strictly under the line budget. truncateTail's partial-last-line edge case is where Dafny's path-sensitive break reasoning and Lean's invariant-at-every-exit discipline diverge — the guard-fold edit noted above makes one loop shape both provers accept.
Both backends regenerate their artifacts from the annotated TypeScript and check them; the generated files are committed and CI fails if they're stale.
Dafny (needs Node ≥ 18 and Dafny ≥ 4.x, with a LemmaScript clone as a sibling directory):
git clone https://github.com/midspiral/LemmaScript.git ../LemmaScript
(cd ../LemmaScript/tools && npm ci)
../LemmaScript/tools/check.sh dafnyLean (additionally needs elan and the Loom/Velvet forks as siblings):
git clone https://github.com/namin/loom.git -b lemma ../loom
git clone https://github.com/namin/velvet.git -b lemma ../velvet
../LemmaScript/tools/check.sh lean # regenerates *.lean and runs `lake build`Per source file the Lean layout is foo.types.lean + foo.def.lean (generated) and foo.proof.lean (hand-written prove_correct tactics); files marked //@ backend dafny are skipped on the Lean pass. CI runs both backends (lemmascript.yml).
These properties hold relative to assumptions made explicit in the annotations — nothing is hidden behind the proof:
- Type shadows. The unreachable
@earendil-works/pi-aimessage graph is shadowed down to what the selector reads://@ declare-type Role = "user" | "assistant" | "toolResult" | …and//@ declare-type AgentMessage { role: Role }.SessionTreeEntryitself auto-resolves; itscustom_message.content(an unmodelable union) becomes an opaque type, present but uninspectable. - Opaque helpers.
estimateTokensis//@ extern— uninterpreted (Dafny:function {:axiom}; Lean: anopaquedeclaration). It reads message fields outside the shadow, so it must be opaque; the safety proofs don't depend on its value, only on which entries are messages.truncate.tssimilarly treatsBuffer.byteLengthand the UTF-8-aware partial-line cut as contracted externs. - Valid-input
requires.0 <= startIndex,endIndex <= entries.length, and — for the no-orphan result — an explicit ordering precondition: within the range, atoolResultis always immediately preceded by a message (its tool-use turn). This is the assumption verification forced into the open; if pi's session tree can ever violate it, the no-orphan guarantee is where that would surface.
An approximate-budget bound (the kept suffix is close to keepRecentTokens) is not proven. Stating anything about retained tokens needs a recursive token-sum over a range, which can't be expressed as an inline //@ spec, can't be a post-return-visible ghost local, and would require adding a helper to the production file (breaking in-place). It would need a spec-level ghost-function feature in LemmaScript. The bound also isn't clean — the forward snap can keep slightly less than the budget — so it's low value on its own.
The CLI's own compaction copy is now proven on both backends, same properties as the agent copy. It shares the agent copy's filename (compaction.ts), and Lean module names are flat/global, so it carries a //@ lean-module compaction-cli directive that re-bases its Lean modules (compaction-cli.types/.def/.proof) to avoid a collision — Dafny keeps the file basename.
Wiring it for Lean also made a backend asymmetry concrete: the per-loop //@ decreases measures, the //@ type c nat on the inner counter, and the cutPoints loop //@ invariants are needed for the Lean proof but not for Dafny — this copy Dafny-verified 8/8 before those annotations were added. Dafny infers loop variants and those invariants; Lean's total-termination and invariant-at-every-exit discipline need them written out. The annotations are additive (Dafny still verifies with them present), so one annotated source drives both backends — Lean just requires a superset.