11# Paginated Reads and Data Prelude — Plan
22
3+ ** Status (2026-06-14): partially implemented / still relevant.** The first
4+ core blocker is shipped: in-eval tool-call ledger compaction landed in
5+ ` 209b4bdf ` , with large paged-result coverage in ` bd7dba65 ` . A concrete
6+ large-file MCP smoke path also exists through
7+ ` examples/large_file_log_introspection/ ` and e2e tests. What remains is the
8+ general ` data/ ` prelude, page-source conventions as a reusable API, M2 A/B
9+ measurement, and a rooted/chunk-capable file source suitable for benchmark
10+ integrity.
11+
312## Context
413
514The planted-anomaly pilot (` ~/ptc-bench-comparison/notes/planted-pilot-results-2026-06-13.md ` )
@@ -14,8 +23,8 @@ killed the design we first reached for:
1423- A 540 KB JSONL file parses to ~ 1.6 MB of maps; eager ` json/parse-lines ` of the
1524 whole content blows the 10 MB sandbox max_heap. ** You cannot read a large
1625 result whole.**
17- - The first design tried to hold the raw result * off-budget* as a refc binary
18- and slice it. ** That is false against the code:** the sandbox arms
26+ - The first design tried to hold a raw result * off-budget* and slice it later.
27+ ** That is false against the code:** the sandbox arms
1928 ` max_heap_size ` with ` include_shared_binaries: true `
2029 (` lib/ptc_runner/sandbox.ex:181 ` ), so binaries acquired * during* eval are
2130 billed to the eval; the rebaseline only exempts data present * before* eval
@@ -24,33 +33,33 @@ killed the design we first reached for:
2433 binary bytes included. Plus the transport already decodes and caps responses
2534 at 2 MiB (` lib/ptc_runner/upstream/runtime.ex:14 ` ), and the evaluator records
2635 every tool result in ` eval_ctx.tool_calls ` — so hiding it from the return
27- value is not enough. Host-side capture + cursor + slicing is a large, risky
28- change built on a wrong assumption.
36+ value is not enough.
2937
30- So this plan does ** not** add a generic cursor/` tool/next ` /result-capture
31- mechanism. Pagination is a ** tool concern** : read a large source through a
32- paginated upstream tool, and fold over the pages in PTC-Lisp.
38+ Therefore pagination is a ** tool concern** : read a large source through a
39+ paginated upstream tool, and fold over the pages in PTC-Lisp. Page position is
40+ ordinary program data: an offset, chunk index, or continuation token passed in
41+ the next ` tool/call ` .
3342
34- A second codex review (round 2) caught the matching retention bug on the
35- fold side and is the reason this plan needs ** one** core change: `(tool/call
36- ...)` stores the ** full result value** of every call in the in-eval tool ledger
43+ The matching retention bug on the fold side was caught before implementation:
44+ ` (tool/call ...) ` stores the ** full result value** of every call in the in-eval tool ledger
3745(` tool_call = %{..., result: result} ` , ` lib/ptc_runner/lisp/eval.ex:1216 ` ,
3846appended to ` eval_ctx.tool_calls ` ). So a fold "discards" a page from its
3947variables but the ledger keeps it — N pages become O(total bytes) of live eval
4048state, billed to max_heap. Paging does not bound memory until the ledger stops
41- retaining full values (see "The one core change"). Everything else is pure
42- tool-arg threading.
49+ retaining full values. That core change is now shipped; everything else in this
50+ plan is pure tool-arg threading plus Prelude V1 library code .
4351
4452This is the M2 candidate for
4553[ ` turn-log-and-prelude-derivation.md ` ] ( turn-log-and-prelude-derivation.md ) :
4654a human-written prelude should pay for itself before P4 derivation starts.
4755
48- ## The one core change: bound the in-eval tool ledger
56+ ## Shipped core change: bound the in-eval tool ledger
4957
50- The in-eval ` eval_ctx.tool_calls ` ledger retains every call's full result value
51- (` eval.ex:1216 ` ) and is only compacted * after* the eval (for the response
58+ The in-eval ` eval_ctx.tool_calls ` ledger used to retain every call's full
59+ result value (` eval.ex:1216 ` ) and is only compacted * after* the eval (for the response
5260envelope). Fatal for a page fold: each page stays live in the ledger. The
53- change: ** compact the ledger as it grows** — past a per-eval bytes/entries cap,
61+ change, now shipped in ` 209b4bdf ` : ** compact the ledger as it grows** — past
62+ a per-eval bytes/entries cap,
5463keep each call's metadata (name, args hash, outcome, duration) and a bounded
5564preview, and drop the full result value ** and large ` :args ` ** . The program holds
5665the value as the ` tool/call ` return, so the ledger never needed it.
@@ -74,11 +83,10 @@ not blanket-dropped:
7483 metadata** (needed for the trace hierarchy) while dropping bulk result bytes.
7584
7685This is a ** public ` Step.tool_calls ` contract change** : ` call.result ` may now be
77- a bounded preview, not the full value. Acceptable in 0.x, but update the tests
78- and envelope expectations that assume full ` result ` . Byte accounting must be
86+ a bounded preview, not the full value. The matching tests and envelope
87+ expectations were updated with the shipped change . Byte accounting must remain
7988real (e.g. ` :erlang.external_size/1 ` over result + args + previews + list
80- overhead), not "bounded in name only." Likely site: ` EvalContext.append_tool_call ` ,
81- reusing the existing ` max_session_tool_call_bytes ` /` _entries ` budgets in-eval.
89+ overhead), not "bounded in name only."
8290
8391## Core design: paginate at the tool, fold in Lisp
8492
@@ -94,10 +102,9 @@ whole — read it a page at a time through a paginated tool.**
941023 . The whole parsed population never exists. Each page's result is small —
95103 well under the 2 MiB transport cap and well under max_heap when parsed.
96104
97- The "cursor" is ** ordinary program state the fold carries** (the next offset, or
98- the continuation token from the previous page). There is ** no host-side cursor,
99- no ` tool/next ` , no result capture, no off-budget hold.** Each page is a normal
100- upstream tool call.
105+ The page position is ordinary program state the fold carries: the next offset,
106+ chunk index, or continuation token from the previous page. Each page is a
107+ normal upstream tool call.
101108
102109A program that ignores this and reads the whole source fails closed at max_heap
103110(demonstrated) — and that fail-closed teaches the model to use the paginated
@@ -117,21 +124,35 @@ The one gap for the M2 benchmark: the default filesystem MCP server
117124(` @modelcontextprotocol/server-filesystem ` ) has only ` head ` /` tail ` , ** no
118125offset** — it cannot forward-page. So M2 needs ** one** chunk-capable line-read
119126tool that returns a bounded page per call (offset/limit, or a chunk-index +
120- lines-per-chunk equivalent). Such MCP servers exist off the shelf — a probe of
121- one (chunk-index + lines-per-chunk, total-chunks/total-lines in every page so
122- the fold bounds its loop exactly and ` :done ` is trivial) confirmed the
123- paginated-read + Lisp-fold design works end to end. It is a bounded tool, not
124- host infrastructure, and authority stays behind the normal tool grant.
127+ lines-per-chunk equivalent). ` @willianpinho/large-file-mcp ` provides this
128+ shape through ` read_large_file_chunk ` (` filePath ` , ` chunkIndex ` ,
129+ ` linesPerChunk ` ) plus file search/navigation helpers. The repo now has an e2e
130+ smoke path using that server for turn-log introspection, confirming the
131+ paginated-read + Lisp-fold design works end to end. It is a bounded upstream
132+ tool, not host infrastructure, and authority stays behind the normal tool
133+ grant.
125134
126135** Hard integrity requirement: the read tool must be rooted to the corpus.**
127- The probed server took unrestricted absolute paths (it read ` /etc/hosts ` ) ,
136+ ` large-file-mcp ` takes absolute paths and the probe read outside the corpus ,
128137which would let an agent read the manifests/scorer by path and defeat the A/B.
129138The chosen tool must confine reads to the corpus directory (like
130139` server-filesystem ` 's allowed-dir), or be wrapped/sandboxed to it.
131140
141+ ## Relationship to P3b large-file ` log/ ` backend
142+
143+ [ ` turn-log-and-prelude-derivation.md ` ] ( turn-log-and-prelude-derivation.md ) 's
144+ P3b is the narrow proving lane for this architecture. It keeps the existing
145+ semantic ` log/ ` API and swaps only the backend: instead of the host-bound
146+ ` TraceLog.Introspection.tools/1 ` backend, an example prelude reads turn-log
147+ JSONL pages through ` @willianpinho/large-file-mcp ` and projects
148+ ` log/sessions ` , ` log/programs ` , and ` log/tool-calls ` in PTC-Lisp.
149+
150+ This plan is the generalization: a reusable ` data/ ` prelude over paginated
151+ sources. There is no conflict. P3b should avoid growing one-off paging helpers
152+ that cannot later be factored into the ` data/ ` source-spec/fold conventions.
153+
132154## Non-Goals
133155
134- - No generic cursor / ` tool/next ` / host-held result buffer.
135156- No capture of full tool results off-budget (the sandbox bills them anyway).
136157- No change to ordinary ` (tool/call ...) ` * call* semantics or the 2 MiB
137158 transport cap. (The one core change is to ledger * retention* , not call
@@ -175,8 +196,8 @@ each call.
175196
176197## Page size — the central tuning knob
177198
178- With no host cursor, ** each page is a real upstream call** , so page size trades
179- two limits against each other:
199+ Each page is a real upstream call, so page size trades two limits against each
200+ other:
180201
181202- ** Too small** → many calls. Two ceilings, and the timeout is the tighter one:
182203 the per-eval upstream-call cap (default 50, the upstream ` RunContext ` cap at
@@ -198,14 +219,12 @@ prelude can size pages **adaptively** — `linesPerChunk ≈ ceil(totalLines / N
198219for a target page count N under the call cap, capped so a parsed page fits
199220max_heap — rather than a fixed default.
200221
201- This is the honest cost of dropping the host cursor: pagination is N upstream
202- round-trips per fold, bounded by the call-cap and (more tightly) the 1 s
203- timeout. Lean to ** few large pages** , not many small ones. Fine for the
204- benchmark sizes if a parsed page fits max_heap; the scaling limit for very large
205- sources. Mitigations if needed: raise the per-fold call cap and/or the eval
206- timeout for paged reads, or a host-side cached paginated source (deferred — that
207- is where a host cursor would re-enter, and it needs the off-budget accounting the
208- sandbox does not give mid-eval today).
222+ Pagination is N upstream round-trips per fold, bounded by the call cap and
223+ (more tightly) the 1 s timeout. Lean to ** few large pages** , not many small
224+ ones. Fine for the benchmark sizes if a parsed page fits max_heap; this is the
225+ scaling limit for very large sources. Mitigations if needed: raise the per-fold
226+ call cap and/or the eval timeout for paged reads, or move a specific workload
227+ to a specialized upstream that performs more aggregation server-side.
209228
210229## Data Prelude
211230
@@ -367,18 +386,15 @@ rediscover the page-fold pattern from recorded runs.
367386## Sequencing
368387
3693881 . ** Chunk-capable read-lines tool.** Use an existing chunked-read MCP server
370- (offset/limit or chunk-index + lines-per-chunk; off-the-shelf ones exist —
371- one was probed and works). ** Must be rooted to the corpus** (the probed one
372- was not — integrity requirement above). Gates everything else (the default
373- fileserver cannot page). Note the page envelope may be double-wrapped (MCP
374- text block holding a JSON string whose field holds the ` \n ` -joined lines), so
375- the prelude's row extraction is: unwrap → ` json/parse-string ` → take the
376- lines field → ` json/parse-lines ` .
377- 2 . ** Core change: bound the in-eval tool ledger** (drop full result values past
378- a bytes/entries cap, keep metadata + preview). Without this, paging does not
379- bound memory — the ledger retains every page. Smallest, highest-leverage
380- item; also a latent-bug fix for any tool-heavy eval. Test: a fold of many
381- page calls stays within max_heap (it does not today).
389+ (offset/limit or chunk-index + lines-per-chunk). The e2e smoke uses
390+ ` @willianpinho/large-file-mcp ` , but the benchmark source must be rooted to
391+ the corpus or wrapped/sandboxed before M2. Note the page envelope may be
392+ double-wrapped (MCP text block holding a JSON string whose field holds the
393+ ` \n ` -joined lines), so the prelude's row extraction is: unwrap →
394+ ` json/parse-string ` → take the lines field → ` json/parse-lines ` .
395+ 2 . ** Done: bound the in-eval tool ledger** (drop full result values past a
396+ bytes/entries cap, keep metadata + preview). This landed in ` 209b4bdf ` , with
397+ large paged-result coverage in ` bd7dba65 ` .
3823983 . ** ` data/ ` prelude** (fold + offset/token conventions in ` :args ` + field-first
383399 helpers), tested against a fake paginated tool. Authority is runtime-enforced
384400 (call fails closed if the tool is not granted), not attach-proven, for the
@@ -395,7 +411,7 @@ rediscover the page-fold pattern from recorded runs.
395411
396412** Verified sound (rounds 2–3):**
397413
398- - No host-side cursor/capture is added .
414+ - Page position is ordinary upstream/tool state threaded through ` :args ` .
399415- The in-eval ledger retains full result values (` eval.ex:1216 ` ); ** no in-eval
400416 code re-reads them** (result returned directly at ` eval.ex:1264 ` ; ledger is
401417 side-effect state at ` context.ex:326 ` ) — so the compaction is
@@ -412,23 +428,22 @@ rediscover the page-fold pattern from recorded runs.
412428- Dynamic ` (tool/call (page-call ...)) ` is not attach-proven (literal-only
413429 inference, ` compiler.ex:814 ` ).
414430
415- ** Still unproven (resolve during implementation):**
431+ ** Still unproven / remaining (resolve during implementation):**
416432
417433- That a fold of the needed page count fits the 1 s timeout for the benchmark's
418434 local stdio tool — ** measure before relying on it.**
419435- That a chosen page size keeps every parsed page under max_heap for the corpus
420436 (depends on parse-expansion ratio) — measure; it fails closed if wrong.
421- - That the in-eval ledger bound, once added, fully bounds a long fold's memory —
422- test with a many-page fold over a multi-MB source.
423- - That metadata + preview is enough for every ` Step.tool_calls ` consumer — this
424- is a ** public contract change** (` call.result ` may be a preview); update tests
425- and envelope expectations. ` tool_cache ` and ` child_step ` retain full data via
426- separate paths and are out of scope / preserved respectively.
437+ - That the in-eval ledger bound fully bounds the intended M2 data-prelude fold
438+ under realistic page sizes and stdio latency. The core mechanism is covered;
439+ the benchmark workload still needs measurement.
440+ - That metadata + preview is enough for every downstream ` Step.tool_calls `
441+ consumer outside the tests already updated. ` tool_cache ` and ` child_step `
442+ retain full data via separate paths and are out of scope / preserved
443+ respectively.
427444
428445## Explicitly deferred
429446
430- - Host-side cached paginated source (would re-introduce a host cursor and needs
431- off-budget mid-eval accounting the sandbox does not give today).
432447- Approximate-state structures and host-side accumulator spill for O(n)
433448 analyses.
434449- Raising the per-fold upstream-call cap for very large sources (only if a real
0 commit comments