feat: support prime sandboxes by 13point5 · Pull Request #4 · alexzhang13/rlm

13point5 · 2026-01-04T02:01:16Z

Add PrimeREPL environment for Prime Intellect sandboxes

Adds a new isolated environment that runs Python code in Prime Intellect sandboxes.

Changes

rlm/environments/prime_repl.py - Full PrimeREPL implementation using HTTP broker pattern (same as ModalREPL)
examples/prime_repl_example.py - Simple rlm.completion example
examples/lm_in_prime_repl.py - Example with code execution inside the sandbox

Notes

exec_script.py sometimes timed out with 300s which is why I increased it to 600s

Example outputs

prime_repl_example.py

PORTKEY_API_KEY: IXlDQrsh...
Created Portkey client with model: @openai/gpt-5-nano
LM Handler started at ('127.0.0.1', 60105)

Creating Prime sandbox...
PrimeREPL created, sandbox ID: ew7axk194anajzxvauqbnqd9
Broker URL: https://8b42decd-7837-4740-a870-8d917bb39b6c.service.sandbox.pinfra.io

Executing:
response = llm_query("What is 2 + 2? Reply with just the number.")
print(response)
print(type(response))
print(context)
print("Secret from setup code: ", secret)

stdout: "4\n<class 'str'>\n\nThis is a test context. It should print out, revealing the magic number to be 4.\n\nSecret from setup code:  1424424\n"
stderr: ''
response variable: "'4'"
locals: {'context': "'\\nThis is a test context. It should print out, revealing the magic number to be 4.\\n'", 'secret': "'1424424'", 'response': "'4'"}
execution time: 79.485s
rlm_calls made: 1

lm_in_prime_repl.py

╭─ ◆ RLM ━ Recursive Language Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                    │
│    Backend                                       openai                          Environment                                   prime               │
│    Model                                         gpt-5-nano                      Max Iterations                                30                  │
│    Max Depth                                     1                                                                                                 │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

───────────────────────────────────────────────────────────────────  Iteration 1  ────────────────────────────────────────────────────────────────────
╭─ ◇ LLM Response  (158.24s) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Here’s the concrete first-step plan I recommend, expressed as REPL actions you can run to inspect the context and start extracting the actual      │
│ prompt. I’ll break it into small, testable steps you can execute in order.                                                                         │
│                                                                                                                                                    │
│ Step 1: Inspect the context basics (type, length, and a short preview)                                                                             │
│ ```repl                                                                                                                                            │
│ print("Context type:", type(context))                                                                                                              │
│ print("Context length:", len(context))                                                                                                             │
│ print("Context preview (first 200 chars):", repr(context[:200]))                                                                                   │
│ ```                                                                                                                                                │
│                                                                                                                                                    │
│ Step 2: Ask the LLM to extract the user’s explicit prompt or question from the context                                                             │
│ ```repl                                                                                                                                            │
│ extracted_prompt = llm_query(                                                                                                                      │
│     f"From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: {context}"                │
│ )                                                                                                                                                  │
│ print("Extracted prompt:", extracted_prompt)                                                                                                       │
│ ```                                                                                                                                                │
│                                                                                                                                                    │
│ Step 3: If the extracted prompt is unclear or empty, try a refinement pass                                                                         │
│ ```repl                                                                                                                                            │
│ if not extracted_prompt or len(extracted_prompt.strip()) < 5:                                                                                      │
│     refined = llm_query(                                                                                                                           │
│         f"Please recheck the following context and clearly state the user's request or task. Context: {context}"                                   │
│     )                                                                                                                                              │
│     print("Refined prompt:", refined)                                                                                                              │
│ else:                                                                                                                                              │
│     print("Proceeding with the extracted prompt.")                                                                                                 │
│ ```                                                                                                                                                │
│                                                                                                                                                    │
│ What I’m aiming to do with these steps:                                                                                                            │
│ - Step 1 gives us the exact shape and content of the context so we know what we’re dealing with.                                                   │
│ - Step 2 uses the LLM to identify the actual prompt or question embedded in the context.                                                           │
│ - Step 3 provides a safety net if Step 2 is inconclusive, ensuring we still get a clear user request.                                              │
│                                                                                                                                                    │
│ After you run these, please share the outputs (the type, length, preview, and the extracted_prompt or refined prompt). With that, I’ll plan the    │
│ next steps (likely chunking the prompt if needed, running batched analyses if there are multiple parts, and then producing a final answer).        │
│                                                                                                                                                    │
│ ~266 words                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.672s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ print("Context type:", type(context))                                                                                                              │
│ print("Context length:", len(context))                                                                                                             │
│ print("Context preview (first 200 chars):", repr(context[:200]))                                                                                   │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Context type: <class 'str'>                                                                                                                        │
│ Context length: 63                                                                                                                                 │
│ Context preview (first 200 chars): 'Using your code, solve 2^(2^(2^(2))). Show your work in Python.'                                               │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (132.290s) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ extracted_prompt = llm_query(                                                                                                                      │
│     f"From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: {context}"                │
│ )                                                                                                                                                  │
│ print("Extracted prompt:", extracted_prompt)                                                                                                       │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Extracted prompt: Using your code, solve 2^(2^(2^(2))). Show your work in Python.                                                                  │
│                                                                                                                                                    │
│                                                                                                                                                    │
│ ↳ 1 sub-call(s)                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─   ↳ Sub-call: gpt-5-nano  (2.99s) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Prompt: From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: Using your code, solve  │
│ 2^(2^(2^(2))). Show your work in Python.                                                                                                           │
│ Response: Using your code, solve 2^(2^(2^(2))). Show your work in Python.                                                                          │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.975s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ if not extracted_prompt or len(extracted_prompt.strip()) < 5:                                                                                      │
│     refined = llm_query(                                                                                                                           │
│         f"Please recheck the following context and clearly state the user's request or task. Context: {context}"                                   │
│     )                                                                                                                                              │
│     print("Refined prompt:", refined)                                                                                                              │
│ else:                                                                                                                                              │
│     print("Proceeding with the extracted prompt.")                                                                                                 │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Proceeding with the extracted prompt.                                                                                                              │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
───────────────────────────────────────────────────────────────────  Iteration 2  ────────────────────────────────────────────────────────────────────
╭─ ◇ LLM Response  (30.91s) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ```repl                                                                                                                                            │
│ # Step-by-step computation of 2^(2^(2^(2)))                                                                                                        │
│ inner = 2                                                                                                                                          │
│ step1 = 2**inner          # 2^2 = 4                                                                                                                │
│ step2 = 2**step1          # 2^4 = 16                                                                                                               │
│ step3 = 2**step2          # 2^16 = 65536                                                                                                           │
│ print("step1:", step1)                                                                                                                             │
│ print("step2:", step2)                                                                                                                             │
│ print("step3 (final result):", step3)                                                                                                              │
│ ```                                                                                                                                                │
│                                                                                                                                                    │
│ FINAL(65536)                                                                                                                                       │
│                                                                                                                                                    │
│ ~40 words                                                                                                                                          │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.773s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ # Step-by-step computation of 2^(2^(2^(2)))                                                                                                        │
│ inner = 2                                                                                                                                          │
│ step1 = 2**inner          # 2^2 = 4                                                                                                                │
│ step2 = 2**step1          # 2^4 = 16                                                                                                               │
│ step3 = 2**step2          # 2^16 = 65536                                                                                                           │
│ print("step1:", step1)                                                                                                                             │
│ print("step2:", step2)                                                                                                                             │
│ print("step3 (final result):", step3)                                                                                                              │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ step1: 4                                                                                                                                           │
│ step2: 16                                                                                                                                          │
│ step3 (final result): 65536                                                                                                                        │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭─ ★ Final Answer ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                    │
│  65536                                                                                                                                             │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
                                                               Iterations       2
                                                               Total Time       360.00s
                                                               Input Tokens     4,057
                                                               Output Tokens    5,683
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════

65536

alexzhang13

Will take a closer look soon, but before then can you also add an example (I will move these to test later when I figured out a way to run these kinds of tests without killing my credits) of doing persistent code execution inside the RL sandbox with a fake LLM (can also use a real LLM)?

There's an example called lm_in_repl.py as an example. Otherwise we can probably do some light testing and add this as a base. I just need to check that all the main components for communicating between the host process and sandbox are fine (it looks fine off a cursory glance though).

13point5 · 2026-01-04T05:11:25Z

will do 👍

alexzhang13 · 2026-01-04T23:26:26Z

@13point5 Shoot me a comment when you want me to review!

Only other thing is to remove the uv.lock from the PR when you're ready. I'll make these adjustments on my side.

13point5 · 2026-01-05T03:26:53Z

Fixing some bugs rn will let u know!

13point5 · 2026-01-06T03:58:52Z

ready for review @alexzhang13

alexzhang13 · 2026-01-08T20:48:10Z

Amazing! Will try to merge this in today.

alexzhang13 · 2026-01-08T21:27:03Z

@13point5 Do you notice that the Prime sandboxes are really slow on your end? I'm not able to reproduce the runtimes in your example above, but it does run correctly (just takes a long time).

They're experimental, so not sure if it's just something on their end.

Can you just re-run the current version for me one more time and post the result here with verbose=True, if it works I'll merge and open another issue just about speeding things up.

13point5 · 2026-01-08T23:02:35Z

Yea sometimes it's really slow @alexzhang13. I'll run it again with verbose=True tonight and update the PR description.

Should I remove uv.lock again?

13point5 · 2026-01-09T03:19:56Z

Updated the PR description with latest execution times for the examples @alexzhang13

prime_repl_example.py - 79.485s
lm_in_prime_repl.py - 360.00s

alexzhang13 · 2026-01-09T16:22:10Z

@13point5 That's ok, let's merge for now because it's functional but I'll add a warning to the README. We will return to this / fix this soon, I'll open up a separate issue for us to discuss how to optimize this later. Will also ping the Prime folks to figure this out as well.

13point5 · 2026-01-09T16:24:55Z

Awesome thank you!

alexzhang13 reviewed Jan 4, 2026

View reviewed changes

13point5 force-pushed the prime-sandboxes branch from aa080cf to 876eb10 Compare January 4, 2026 19:26

13point5 changed the title ~~support prime sandboxes~~ feat: support prime sandboxes Jan 6, 2026

13point5 force-pushed the prime-sandboxes branch from ed429d9 to 107570e Compare January 6, 2026 03:31

13point5 added 6 commits January 5, 2026 22:33

support prime sandboxes

c075996

fix bug in sandbox start to handle timeouts

bc0f5fc

change log msg

83019cd

increase timeout

5e380e6

increase sandbox creation timeout

a974eaf

update prime repl

aee6ec7

13point5 force-pushed the prime-sandboxes branch from 107570e to aee6ec7 Compare January 6, 2026 03:34

13point5 added 3 commits January 5, 2026 22:48

update readme

c36746b

update readme

099de99

increase timeout for exec_script

140bde9

13point5 marked this pull request as ready for review January 6, 2026 03:58

refactor packages into constants

9fc2568

alexzhang13 added 2 commits January 8, 2026 15:49

Merge branch 'main' into prime-sandboxes

a443acd

refactor constants out, lint

c3be8e7

skip prime_sandboxes import on test

d61c485

update description / warning on prime sandboxes

f8765fc

alexzhang13 merged commit 625a69e into alexzhang13:main Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support prime sandboxes#4

feat: support prime sandboxes#4
alexzhang13 merged 14 commits intoalexzhang13:mainfrom
13point5:prime-sandboxes

13point5 commented Jan 4, 2026 •

edited

Loading

Uh oh!

alexzhang13 left a comment

Uh oh!

13point5 commented Jan 4, 2026

Uh oh!

alexzhang13 commented Jan 4, 2026

Uh oh!

13point5 commented Jan 5, 2026

Uh oh!

13point5 commented Jan 6, 2026

Uh oh!

alexzhang13 commented Jan 8, 2026

Uh oh!

alexzhang13 commented Jan 8, 2026 •

edited

Loading

Uh oh!

13point5 commented Jan 8, 2026

Uh oh!

13point5 commented Jan 9, 2026

Uh oh!

alexzhang13 commented Jan 9, 2026

Uh oh!

13point5 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

13point5 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add PrimeREPL environment for Prime Intellect sandboxes

Changes

Notes

Example outputs

Uh oh!

alexzhang13 left a comment

Choose a reason for hiding this comment

Uh oh!

13point5 commented Jan 4, 2026

Uh oh!

alexzhang13 commented Jan 4, 2026

Uh oh!

13point5 commented Jan 5, 2026

Uh oh!

13point5 commented Jan 6, 2026

Uh oh!

alexzhang13 commented Jan 8, 2026

Uh oh!

alexzhang13 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

13point5 commented Jan 8, 2026

Uh oh!

13point5 commented Jan 9, 2026

Uh oh!

alexzhang13 commented Jan 9, 2026

Uh oh!

13point5 commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

13point5 commented Jan 4, 2026 •

edited

Loading

alexzhang13 commented Jan 8, 2026 •

edited

Loading