yoyo/docs/rlm-and-yoyo.html at main · avirajkhare00/yoyo · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>What yoyo shares with Recursive Language Models, and what it does not</title>
  <style>
    :root {
      --bg: #f4f1ea;
      --paper: #fbf8f2;
      --ink: #171512;
      --muted: #5f5a53;
      --line: #d8d1c6;
      --accent: #2f5e4e;
      --warm: #8b4b2f;
    }

    * {
      box-sizing: border-box;
    }

    body {
      margin: 0;
      background: var(--bg);
      color: var(--ink);
      font-family: "Avenir Next", "Segoe UI", Helvetica, Arial, sans-serif;
      line-height: 1.68;
    }

    a {
      color: var(--ink);
      text-decoration: none;
      border-bottom: 1px solid rgba(23, 21, 18, 0.22);
    }

    a:hover {
      border-bottom-color: rgba(23, 21, 18, 0.55);
    }

    code,
    pre {
      font-family: "SFMono-Regular", Menlo, Consolas, monospace;
    }

    .page {
      max-width: 780px;
      margin: 0 auto;
      padding: 24px 24px 80px;
    }

    .nav {
      display: flex;
      justify-content: space-between;
      align-items: baseline;
      gap: 20px;
      padding-bottom: 28px;
      border-bottom: 1px solid var(--line);
    }

    .brand {
      font-size: 1rem;
      font-weight: 700;
      letter-spacing: 0.01em;
    }

    .brand span {
      color: var(--warm);
    }

    .nav-links {
      display: flex;
      flex-wrap: wrap;
      gap: 14px;
      font-size: 0.94rem;
      color: var(--muted);
    }

    header {
      padding: 54px 0 28px;
      border-bottom: 1px solid var(--line);
    }

    .eyebrow {
      margin-bottom: 14px;
      color: var(--warm);
      font-size: 0.76rem;
      font-weight: 700;
      letter-spacing: 0.12em;
      text-transform: uppercase;
    }

    h1,
    h2 {
      font-family: "Iowan Old Style", "Palatino Linotype", Georgia, serif;
      font-weight: 600;
      letter-spacing: -0.03em;
    }

    h1 {
      margin: 0 0 16px;
      font-size: clamp(2.5rem, 6vw, 4.2rem);
      line-height: 0.98;
    }

    .deck {
      max-width: 42rem;
      color: var(--muted);
      font-size: 1.08rem;
      margin: 0 0 12px;
    }

    .meta {
      margin-top: 20px;
      color: var(--muted);
      font-size: 0.94rem;
    }

    main {
      padding-top: 26px;
    }

    section {
      padding: 26px 0;
      border-bottom: 1px solid var(--line);
    }

    section:last-of-type {
      border-bottom: 0;
    }

    h2 {
      margin: 0 0 14px;
      font-size: 1.85rem;
      line-height: 1.05;
    }

    p {
      margin: 0 0 14px;
      font-size: 1rem;
    }

    .note {
      padding: 18px 20px;
      background: var(--paper);
      border: 1px solid var(--line);
      margin: 18px 0;
    }

    .note strong {
      display: block;
      margin-bottom: 8px;
      font-size: 0.96rem;
    }

    ul {
      margin: 0 0 14px 22px;
      padding: 0;
    }

    li {
      margin: 0 0 8px;
    }

    .footer {
      margin-top: 28px;
      padding-top: 18px;
      border-top: 1px solid var(--line);
      color: var(--muted);
      font-size: 0.94rem;
    }

    @media (max-width: 720px) {
      .nav {
        flex-direction: column;
        align-items: flex-start;
      }

      .page {
        padding: 18px 18px 64px;
      }
    }
  </style>
</head>
<body>
  <div class="page">
    <nav class="nav">
      <div class="brand">yo<span>yo</span></div>
      <div class="nav-links">
        <a href="index.html">Home</a>
        <a href="README.md">Docs</a>
        <a href="https://github.com/avirajkhare00/yoyo">GitHub</a>
      </div>
    </nav>

    <header>
      <div class="eyebrow">Essay</div>
      <h1>What yoyo shares with Recursive Language Models, and what it does not</h1>
      <p class="deck">
        The strongest idea in the RLM paper is not “give the model more tokens.” It is “stop treating the prompt as the place where all work must happen.” That is also the right direction for <code>yoyo</code>.
      </p>
      <p class="deck">
        But that does not make <code>yoyo</code> RLM-compatible. <code>yoyo</code> has grounded repo tools and bounded repair loops, but it does not yet implement the paper’s defining move: model-chosen recursive self-calls on smaller subquestions and sub-contexts.
      </p>
      <p class="meta">
        Based on <em>Recursive Language Models</em> by Omar Khattab, Tim Kraska, and Alex L. Zhang.
      </p>
    </header>

    <main>
      <section>
        <h2>The paper’s real move</h2>
        <p>
          The paper argues that very long inputs should be treated as part of an external environment instead of being stuffed directly into the model context. Their Recursive Language Model loads the input into a Python REPL, lets the model inspect pieces of it with code, and lets it recursively call itself on smaller subproblems.
        </p>
        <p>
          That is the important shift. The gain is not just “more context.” The gain is moving from prompt stuffing to environment interaction.
        </p>
        <div class="note">
          <strong>Important distinction</strong>
          The environment idea is only part of the paper. The stronger mechanism is recursive control: <code>RLM(q, context) → RLM(sub_q, sub_context)</code> when the model decides a smaller subproblem deserves its own working set.
        </div>
      </section>

      <section>
        <h2>What the REPL part actually means</h2>
        <p>
          A REPL is just an interactive programming session. In the paper, the full prompt becomes a variable in that session. The model can ask for slices, search with code, decompose the input, and recurse on smaller chunks instead of trying to hold the whole thing in token memory at once.
        </p>
        <div class="note">
          <strong>Why that matters</strong>
          The prompt stops being “all the text the model must read now” and becomes “data in an environment the model can inspect when needed.”
        </div>
      </section>

      <section>
        <h2>Why this matches yoyo</h2>
        <p>
          <code>yoyo</code> is not building a Python REPL for source code, but it is pushing toward the same abstraction. A repository should be treated as an external environment with grounded interfaces, not as a giant prompt.
        </p>
        <p>
          That is what <code>boot</code>, <code>index</code>, <code>judge_change</code>, <code>inspect</code>, and <code>change</code> are trying to do. They make the repo queryable in smaller, more reliable pieces. The model should not need to drag full files or entire subsystems into context just to answer one ownership or invariant question.
        </p>
      </section>

      <section>
        <h2>What is now actually inside yoyo</h2>
        <p>
          The essay used to overstate the connection. <code>yoyo</code> does not already contain an RLM in the paper’s sense. What it does contain is a more grounded repo workflow with bounded inspect-fail-repair steps.
        </p>
        <p>
          The clearest example is the new guarded write path. A write no longer means “edit the file and hope.” <code>yoyo</code> writes the candidate change, runs the relevant checks, and restores the original file if the new version fails. When that happens, the failure is returned as machine-readable <code>guard_failure</code> data, and <code>retry_plan</code> turns that failure into a bounded inspect-fix-retry workflow.
        </p>
        <div class="note">
          <strong>That is a related product move, not the same mechanism</strong>
          The model does not have to hold the whole repository or the whole failure state in prompt memory. The repository is the environment, the failure is structured state, and the next step is a constrained interaction over that state. But the control loop is still flatter than the paper’s recursive self-decomposition.
        </div>
        <p>
          This matters even more for interpreted languages. Python, JavaScript, Ruby, PHP, and Clojure often fail after parsing, not before. So <code>yoyo</code> now uses runtime guards to catch “this file parses but breaks when it actually runs” cases, then routes those failures back into the same repair loop. That is closer to environment-mediated repair than the old “search a lot and dump source into context” approach, but it is still not the paper’s recursive inspect-and-act mechanism.
        </p>
      </section>

      <section>
        <h2>Why search is not the moat</h2>
        <p>
          The paper makes one thing clearer: the moat is not raw retrieval. If all you have is better grep, the model still has to do the real work in its own prompt state. That is fragile and easy to replace.
        </p>
        <p>
          The deeper value is giving the model a grounded way to ask higher-level questions about the codebase:
        </p>
        <ul>
          <li>Where should this fix live?</li>
          <li>What must remain true?</li>
          <li>What else touches this?</li>
          <li>What is the minimal safe write surface?</li>
        </ul>
        <p>
          That is why <code>judge_change</code> matters more than trying to win a search benchmark.
        </p>
      </section>

      <section>
        <h2>Read first, then write</h2>
        <p>
          The paper’s results are strongest on tasks where the model must manage dense, multi-hop information without collapsing into lossy summaries. That maps directly to the most honest product statement for <code>yoyo</code>:
        </p>
        <div class="note">
          <strong>Current product takeaway</strong>
          <code>yoyo</code> is strongest when read judgment narrows the surface first, then <code>change</code> executes the write cleanly.
        </div>
        <p>
          That is also what our recent directed evals are showing. The read side matters because the write should happen only after the correct ownership layer and invariants are grounded.
        </p>
        <p>
          What changed recently is that the write side now has a bounded repair loop too. But that loop is still mostly flat: read carefully, write through a guard, inspect the resulting failure state if needed, then retry. The paper’s stronger move is that the model itself decides when to recurse on a smaller subquestion with a transformed sub-context. <code>yoyo</code> is not doing that yet.
        </p>
      </section>

      <section>
        <h2>A concrete Clojure MCP example</h2>
        <p>
          The clean way to mimic the paper on a Clojure repo today is to keep the recursive decision-making outside <code>yoyo</code> and use <code>yoyo</code> as the grounded environment.
        </p>
        <p>
          In a tiny Clojure repo with a <code>greet</code> function, the MCP loop can look like this:
        </p>
        <pre><code>boot + index
→ inspect(name="greet")
→ ask("format a full name into a greeting")
→ judge_change(...)
→ change(...)
→ guard_failure
→ retry_plan
→ inspect(targeted lines)
→ change(...)</code></pre>
        <p>
          In the demo, <code>inspect</code> found <code>my.app/greet</code>, <code>ask</code> ranked the same function first for the intent query, and <code>judge_change</code> narrowed ownership to <code>src/my/app/core.clj</code>. Then a bad write added a missing namespace import. <code>yoyo</code> rejected the write, restored the file, and returned structured failure state instead of leaving broken Clojure on disk.
        </p>
        <pre><code>guard_failure: {
  "phase": "post_write_guard",
  "retryable": true,
  "files_restored": true,
  "files": [
    {
      "file": "src/my/app/core.clj",
      "errors": [
        {
          "kind": "clojure-runtime",
          "text": "Could not locate missing/ns__init.class"
        }
      ]
    }
  ]
}</code></pre>
        <p>
          After that, <code>retry_plan</code> narrowed the next read surface to the namespace form at the top of the file and produced a bounded retry workflow. A corrected <code>change</code> then landed a valid multi-arity <code>greet</code> implementation.
        </p>
        <div class="note">
          <strong>What this proves, and what it does not</strong>
          This is the closest current <code>yoyo</code> behavior gets to an RLM on Clojure: the repo stays outside prompt memory, failures become structured state, and the next step gets smaller. But the outer model is still the part deciding whether to recurse or spawn a subtask. <code>yoyo</code> is the environment, not the recursive controller.
        </div>
      </section>

      <section>
        <h2>Where yoyo differs from an RLM today</h2>
        <p>
          The paper uses a very flexible REPL. For general long-context reasoning, that makes sense. For software engineering, a more structured surface is often better.
        </p>
        <p>
          It is also worth being precise about the current gap. The paper is not just about tool use or keeping context outside the prompt. It is about recursive control. The model can choose to turn <code>q</code> into <code>sub_q</code>, transform the accessible context, and call itself again. <code>yoyo</code> currently offers curated repo tools and bounded retry workflows, not recursive sub-agenting of that form.
        </p>
        <p>
          A coding system needs:
        </p>
        <ul>
          <li>less hallucination, not just more freedom</li>
          <li>safe writes, not arbitrary code execution</li>
          <li>repeatable interfaces over repository truth</li>
          <li>clear boundaries between read judgment and write execution</li>
        </ul>
        <p>
          So the lesson is not “turn <code>yoyo</code> into a Python REPL.” The lesson is “keep moving toward repo-as-environment, but keep the interface opinionated and safe.”
        </p>
        <p>
          That is also why the current runtime bootstrap matters. <code>yoyo</code> can now create <code>yoyo.json</code> automatically for supported interpreted languages, but it does so with least-privilege defaults. The setup is automated for the agent; broader runtime access still has to be made explicit in repo policy. That is the right difference between a software engineering environment and a general-purpose free-form REPL.
        </p>
      </section>

      <section>
        <h2>What this suggests for yoyo next</h2>
        <p>
          If the paper is directionally right for <code>yoyo</code>, then the next work should keep compressing repo understanding into grounded read surfaces.
        </p>
        <ul>
          <li>Make ownership and invariant judgment cheaper and more reliable.</li>
          <li>Keep strengthening cheap structured reads over dumping source text.</li>
          <li>Treat writes as the second step, not the first instinct.</li>
          <li>Keep making failure states executable and machine-readable so repair loops are bounded instead of improvised.</li>
          <li>Extend the same loop to more interpreted and functional languages where runtime feedback is the real truth boundary.</li>
          <li>Avoid competing on generic search when the real value is judgment plus constrained execution.</li>
        </ul>
      </section>

      <section>
        <h2>Bottom line</h2>
        <p>
          The RLM paper is a strong validation of one direction <code>yoyo</code> is already moving toward. The future is not a bigger prompt. The future is a smaller, more truthful interface to an external environment.
        </p>
        <p>
          For long documents, that environment might be a REPL over text. For codebases, it should be grounded repository tools. That is why the paper feels aligned with <code>yoyo</code>: both are trying to move work out of token memory and into structured interaction with what is actually there.
        </p>
        <p>
          The newer twist is that <code>yoyo</code> is no longer only doing that on the read side. With guarded writes, runtime checks, <code>guard_failure</code>, and <code>retry_plan</code>, parts of a bounded repair loop are now in the product itself. That is useful progress, but it is not yet RLM-style recursive subproblem delegation.
        </p>
      </section>

      <div class="footer">
        <a href="index.html">Back to yoyo</a>
      </div>
    </main>
  </div>
</body>
</html>