Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,22 +156,25 @@ Environment implementations live in `rlm/environments/`. Choose the appropriate
- Inherit from `NonIsolatedEnv` or `IsolatedEnv` in `rlm/environments/base_env.py`
- Implement all abstract methods: `setup`, `load_context`, `execute_code`
- Return `REPLResult` from `execute_code`
- Handle `lm_handler_address` for sub-LM calls via `llm_query()`
- Handle `lm_handler_address` for LM calls via `llm_query()` and `rlm_query()`
- Implement `cleanup()` for resource management
- Register environment in `rlm/environments/__init__.py`

### Key Implementation Details
- `setup()`: Initialize globals, locals, and helper functions
- `load_context()`: Make context available as `context` variable
- `execute_code()`: Execute code, capture stdout/stderr, return `REPLResult`
- Always provide `llm_query` and `llm_query_batched` functions in environment globals
- Always provide `llm_query`, `llm_query_batched`, `rlm_query`, and `rlm_query_batched` functions in environment globals

### State Management
Environments must provide these globals to executed code:
- `context`: The loaded context payload
- `llm_query(prompt, model=None)`: For sub-LM calls
- `llm_query_batched(prompts, model=None)`: For batched sub-LM calls
- `llm_query(prompt, model=None)`: Plain single LM completion (no REPL, no iteration)
- `llm_query_batched(prompts, model=None)`: Batched plain LM completions
- `rlm_query(prompt, model=None)`: Recursive child RLM call (own REPL + iteration). Falls back to `llm_query` at max depth.
- `rlm_query_batched(prompts, model=None)`: Batched recursive child RLM calls
- `FINAL_VAR(variable_name)`: For returning final answers
- `SHOW_VARS()`: For listing available variables

### Example Structure
```python
Expand Down Expand Up @@ -204,7 +207,8 @@ class MyEnvironment(NonIsolatedEnv):
- Guidelines here are followed
- Environment works with basic RLM completion calls
- `cleanup()` properly releases all resources
- Sub-LM calls work via `llm_query()`
- Sub-LM calls work via `llm_query()` and `rlm_query()`
- Reserved names (`llm_query`, `rlm_query`, `context`, `history`, `FINAL_VAR`, `SHOW_VARS`) are restored after each execution

## Architecture: Environment ↔ LM Handler Communication

Expand All @@ -223,7 +227,7 @@ Understanding how environments communicate with the LM Handler is essential for
│ ▼ │ │
│ ┌─────────────┐ Socket (TCP) │ │
│ │ LocalREPL │────────────────────────────────────┘ │
│ │ (exec code) │ llm_query() → send_lm_request()
│ │ (exec code) │ llm_query() / rlm_query() → LM calls
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
Expand All @@ -242,8 +246,8 @@ def socket_send(sock: socket.socket, data: dict) -> None:
```

**Request Flow**:
1. Environment's `llm_query(prompt)` is called during code execution
2. Creates `LMRequest` dataclass and calls `send_lm_request(address, request)`
1. Environment's `llm_query(prompt)` or `rlm_query(prompt)` is called during code execution
2. For `llm_query`: creates `LMRequest` and calls `send_lm_request(address, request)`. For `rlm_query`: invokes `subcall_fn` to spawn a child RLM (or falls back to `llm_query` at max depth).
3. Opens TCP connection to `LMHandler` at `(host, port)`
4. Sends length-prefixed JSON request
5. `LMHandler` processes via `LMRequestHandler.handle()`
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,11 @@ make quickstart
</details>

## REPL Environments
We support two types of REPL environments -- isolated, and non-isolated. Non-isolated environments (default) run code execution on the same machine as the RLM (e.g. through `exec`), which is pretty reasonable for some local low-risk tasks, like simple benchmarking, but can be problematic if the prompts or tool calls can interact with malicious users. Fully isolated environments used Cloud-based sandboxes (e.g. Prime Sandboxes, [Modal Sandboxes](https://modal.com/docs/guide/sandboxes)) to run code generated by the RLM, ensuring completely isolation from the host process. Environments can be added, but we natively support the following: `local` (default), `modal`, `prime`.
We support two types of REPL environments -- isolated, and non-isolated. Non-isolated environments (default) run code execution on the same machine as the RLM (e.g. through `exec`), which is pretty reasonable for some local low-risk tasks, like simple benchmarking, but can be problematic if the prompts or tool calls can interact with malicious users. Fully isolated environments use cloud-based sandboxes (e.g. Prime Sandboxes, [Modal Sandboxes](https://modal.com/docs/guide/sandboxes)) to run code generated by the RLM, ensuring complete isolation from the host process. Environments can be added, but we natively support the following: `local` (default), `docker`, `modal`, `prime`, `daytona`, `e2b`.

```python
rlm = RLM(
environment="...", # "local", "docker", "modal", "prime"
environment="...", # "local", "docker", "modal", "prime", "daytona", "e2b"
environment_kwargs={...},
)
```
Expand Down Expand Up @@ -124,19 +124,19 @@ We currently support most major clients (OpenAI, Anthropic), as well as the rout
If you use this code or repository in your research, please cite:

```bibtex
@misc{zhang2025recursivelanguagemodels,
title={Recursive Language Models},
@misc{zhang2026recursivelanguagemodels,
title={Recursive Language Models},
author={Alex L. Zhang and Tim Kraska and Omar Khattab},
year={2025},
year={2026},
eprint={2512.24601},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24601},
url={https://arxiv.org/abs/2512.24601},
}
```

## Optional: Trajectory metadata and logging
`RLMChatCompletion` has an optional `metadata` field (default empty) that can hold the full trajectory (run config + all iterations and sub-calls) so you can reconstruct the run. Pass an `RLMLogger` to capture it:
`RLMChatCompletion` has an optional `metadata` field (default `None`) that holds the full trajectory (run config + all iterations and sub-calls) so you can reconstruct the run. Pass an `RLMLogger` to capture it:

- **In-memory only** (trajectory on `completion.metadata`): `logger=RLMLogger()` (no `log_dir`).
- **Also save to disk** (JSONL for the visualizer): `logger=RLMLogger(log_dir="./logs")`.
Expand Down
Loading