Skip to content

feat: support prime sandboxes#4

Merged
alexzhang13 merged 14 commits intoalexzhang13:mainfrom
13point5:prime-sandboxes
Jan 9, 2026
Merged

feat: support prime sandboxes#4
alexzhang13 merged 14 commits intoalexzhang13:mainfrom
13point5:prime-sandboxes

Conversation

@13point5
Copy link
Contributor

@13point5 13point5 commented Jan 4, 2026

Add PrimeREPL environment for Prime Intellect sandboxes

Adds a new isolated environment that runs Python code in Prime Intellect sandboxes.

Changes

  • rlm/environments/prime_repl.py - Full PrimeREPL implementation using HTTP broker pattern (same as ModalREPL)
  • examples/prime_repl_example.py - Simple rlm.completion example
  • examples/lm_in_prime_repl.py - Example with code execution inside the sandbox

Notes

  • exec_script.py sometimes timed out with 300s which is why I increased it to 600s

Example outputs

  • prime_repl_example.py
PORTKEY_API_KEY: IXlDQrsh...
Created Portkey client with model: @openai/gpt-5-nano
LM Handler started at ('127.0.0.1', 60105)

Creating Prime sandbox...
PrimeREPL created, sandbox ID: ew7axk194anajzxvauqbnqd9
Broker URL: https://8b42decd-7837-4740-a870-8d917bb39b6c.service.sandbox.pinfra.io

Executing:
response = llm_query("What is 2 + 2? Reply with just the number.")
print(response)
print(type(response))
print(context)
print("Secret from setup code: ", secret)

stdout: "4\n<class 'str'>\n\nThis is a test context. It should print out, revealing the magic number to be 4.\n\nSecret from setup code:  1424424\n"
stderr: ''
response variable: "'4'"
locals: {'context': "'\\nThis is a test context. It should print out, revealing the magic number to be 4.\\n'", 'secret': "'1424424'", 'response': "'4'"}
execution time: 79.485s
rlm_calls made: 1
  • lm_in_prime_repl.py
╭─ ◆ RLM ━ Recursive Language Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                    │
│    Backend                                       openai                          Environment                                   prime               │
│    Model                                         gpt-5-nano                      Max Iterations                                30                  │
│    Max Depth                                     1                                                                                                 │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

───────────────────────────────────────────────────────────────────  Iteration 1  ────────────────────────────────────────────────────────────────────
╭─ ◇ LLM Response  (158.24s) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Here’s the concrete first-step plan I recommend, expressed as REPL actions you can run to inspect the context and start extracting the actual      │
│ prompt. I’ll break it into small, testable steps you can execute in order.                                                                         │
│                                                                                                                                                    │
│ Step 1: Inspect the context basics (type, length, and a short preview)                                                                             │
│ ```repl                                                                                                                                            │
│ print("Context type:", type(context))                                                                                                              │
│ print("Context length:", len(context))                                                                                                             │
│ print("Context preview (first 200 chars):", repr(context[:200]))                                                                                   │
```                                                                                                                                                │
│                                                                                                                                                    │
│ Step 2: Ask the LLM to extract the user’s explicit prompt or question from the context                                                             │
│ ```repl                                                                                                                                            │
│ extracted_prompt = llm_query(                                                                                                                      │
│     f"From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: {context}"
│ )                                                                                                                                                  │
│ print("Extracted prompt:", extracted_prompt)                                                                                                       │
```                                                                                                                                                │
│                                                                                                                                                    │
│ Step 3: If the extracted prompt is unclear or empty, try a refinement pass                                                                         │
│ ```repl                                                                                                                                            │
if not extracted_prompt or len(extracted_prompt.strip()) < 5:                                                                                      │
│     refined = llm_query(                                                                                                                           │
│         f"Please recheck the following context and clearly state the user's request or task. Context: {context}"
│     )                                                                                                                                              │
│     print("Refined prompt:", refined)                                                                                                              │
│ else:                                                                                                                                              │
│     print("Proceeding with the extracted prompt.")                                                                                                 │
```
│                                                                                                                                                    │
│ What I’m aiming to do with these steps:                                                                                                            │
│ - Step 1 gives us the exact shape and content of the context so we know what we’re dealing with.                                                   │
│ - Step 2 uses the LLM to identify the actual prompt or question embedded in the context.                                                           │
│ - Step 3 provides a safety net if Step 2 is inconclusive, ensuring we still get a clear user request.                                              │
│                                                                                                                                                    │
│ After you run these, please share the outputs (the type, length, preview, and the extracted_prompt or refined prompt). With that, I’ll plan the    │
│ next steps (likely chunking the prompt if needed, running batched analyses if there are multiple parts, and then producing a final answer).        │
│                                                                                                                                                    │
~266 words                                                                                                                                         │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.672s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ print("Context type:", type(context))                                                                                                              │
│ print("Context length:", len(context))                                                                                                             │
│ print("Context preview (first 200 chars):", repr(context[:200]))                                                                                   │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Context type: <class 'str'>
│ Context length: 63                                                                                                                                 │
│ Context preview (first 200 chars): 'Using your code, solve 2^(2^(2^(2))). Show your work in Python.'
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (132.290s) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ extracted_prompt = llm_query(                                                                                                                      │
│     f"From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: {context}"
│ )                                                                                                                                                  │
│ print("Extracted prompt:", extracted_prompt)                                                                                                       │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Extracted prompt: Using your code, solve 2^(2^(2^(2))). Show your work in Python.                                                                  │
│                                                                                                                                                    │
│                                                                                                                                                    │
│ ↳ 1 sub-call(s)                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─   ↳ Sub-call: gpt-5-nano  (2.99s) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Prompt: From the following user context, extract the user's explicit prompt or question. Provide only the prompt. Context: Using your code, solve  │
│ 2^(2^(2^(2))). Show your work in Python.                                                                                                           │
│ Response: Using your code, solve 2^(2^(2^(2))). Show your work in Python.                                                                          │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.975s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ if not extracted_prompt or len(extracted_prompt.strip()) < 5:                                                                                      │
│     refined = llm_query(                                                                                                                           │
│         f"Please recheck the following context and clearly state the user's request or task. Context: {context}"
│     )                                                                                                                                              │
│     print("Refined prompt:", refined)                                                                                                              │
│ else:                                                                                                                                              │
│     print("Proceeding with the extracted prompt.")                                                                                                 │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ Proceeding with the extracted prompt.                                                                                                              │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
───────────────────────────────────────────────────────────────────  Iteration 2  ────────────────────────────────────────────────────────────────────
╭─ ◇ LLM Response  (30.91s) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
```repl                                                                                                                                            │
# Step-by-step computation of 2^(2^(2^(2)))                                                                                                        │
│ inner = 2                                                                                                                                          │
│ step1 = 2**inner          # 2^2 = 4                                                                                                                │
│ step2 = 2**step1          # 2^4 = 16                                                                                                               │
│ step3 = 2**step2          # 2^16 = 65536                                                                                                           │
│ print("step1:", step1)                                                                                                                             │
│ print("step2:", step2)                                                                                                                             │
│ print("step3 (final result):", step3)                                                                                                              │
```
│                                                                                                                                                    │
│ FINAL(65536)                                                                                                                                       │
│                                                                                                                                                    │
│ ~40 words                                                                                                                                          │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ ▸ Code Execution  (0.773s) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Code:                                                                                                                                              │
│ # Step-by-step computation of 2^(2^(2^(2)))                                                                                                        │
│ inner = 2                                                                                                                                          │
│ step1 = 2**inner          # 2^2 = 4                                                                                                                │
│ step2 = 2**step1          # 2^4 = 16                                                                                                               │
│ step3 = 2**step2          # 2^16 = 65536                                                                                                           │
│ print("step1:", step1)                                                                                                                             │
│ print("step2:", step2)                                                                                                                             │
│ print("step3 (final result):", step3)                                                                                                              │
│                                                                                                                                                    │
│ Output:                                                                                                                                            │
│ step1: 4                                                                                                                                           │
│ step2: 16                                                                                                                                          │
│ step3 (final result): 65536                                                                                                                        │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭─ ★ Final Answer ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                    │
│  65536                                                                                                                                             │
│                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
                                                               Iterations       2
                                                               Total Time       360.00s
                                                               Input Tokens     4,057
                                                               Output Tokens    5,683
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════

65536

Copy link
Owner

@alexzhang13 alexzhang13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a closer look soon, but before then can you also add an example (I will move these to test later when I figured out a way to run these kinds of tests without killing my credits) of doing persistent code execution inside the RL sandbox with a fake LLM (can also use a real LLM)?

There's an example called lm_in_repl.py as an example. Otherwise we can probably do some light testing and add this as a base. I just need to check that all the main components for communicating between the host process and sandbox are fine (it looks fine off a cursory glance though).

@13point5
Copy link
Contributor Author

13point5 commented Jan 4, 2026

will do 👍

@alexzhang13
Copy link
Owner

@13point5 Shoot me a comment when you want me to review!

Only other thing is to remove the uv.lock from the PR when you're ready. I'll make these adjustments on my side.

@13point5
Copy link
Contributor Author

13point5 commented Jan 5, 2026

Fixing some bugs rn will let u know!

@13point5 13point5 changed the title support prime sandboxes feat: support prime sandboxes Jan 6, 2026
@13point5 13point5 marked this pull request as ready for review January 6, 2026 03:58
@13point5
Copy link
Contributor Author

13point5 commented Jan 6, 2026

ready for review @alexzhang13

@alexzhang13
Copy link
Owner

Amazing! Will try to merge this in today.

@alexzhang13
Copy link
Owner

alexzhang13 commented Jan 8, 2026

@13point5 Do you notice that the Prime sandboxes are really slow on your end? I'm not able to reproduce the runtimes in your example above, but it does run correctly (just takes a long time).

They're experimental, so not sure if it's just something on their end.

Can you just re-run the current version for me one more time and post the result here with verbose=True, if it works I'll merge and open another issue just about speeding things up.

@13point5
Copy link
Contributor Author

13point5 commented Jan 8, 2026

Yea sometimes it's really slow @alexzhang13. I'll run it again with verbose=True tonight and update the PR description.

Should I remove uv.lock again?

@13point5
Copy link
Contributor Author

13point5 commented Jan 9, 2026

Updated the PR description with latest execution times for the examples @alexzhang13

  1. prime_repl_example.py - 79.485s
  2. lm_in_prime_repl.py - 360.00s

@alexzhang13
Copy link
Owner

@13point5 That's ok, let's merge for now because it's functional but I'll add a warning to the README. We will return to this / fix this soon, I'll open up a separate issue for us to discuss how to optimize this later. Will also ping the Prime folks to figure this out as well.

@alexzhang13 alexzhang13 merged commit 625a69e into alexzhang13:main Jan 9, 2026
@13point5
Copy link
Contributor Author

13point5 commented Jan 9, 2026

Awesome thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants