alexzhang13 · alexzhang13 · Feb 18, 2026 · Feb 2, 2026 · Jan 24, 2026 · Feb 2, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -156,22 +156,25 @@ Environment implementations live in `rlm/environments/`. Choose the appropriate
 - Inherit from `NonIsolatedEnv` or `IsolatedEnv` in `rlm/environments/base_env.py`
 - Implement all abstract methods: `setup`, `load_context`, `execute_code`
 - Return `REPLResult` from `execute_code`
-- Handle `lm_handler_address` for sub-LM calls via `llm_query()`
+- Handle `lm_handler_address` for LM calls via `llm_query()` and `rlm_query()`
 - Implement `cleanup()` for resource management
 - Register environment in `rlm/environments/__init__.py`
 
 ### Key Implementation Details
 - `setup()`: Initialize globals, locals, and helper functions
 - `load_context()`: Make context available as `context` variable
 - `execute_code()`: Execute code, capture stdout/stderr, return `REPLResult`
-- Always provide `llm_query` and `llm_query_batched` functions in environment globals
+- Always provide `llm_query`, `llm_query_batched`, `rlm_query`, and `rlm_query_batched` functions in environment globals
 
 ### State Management
 Environments must provide these globals to executed code:
 - `context`: The loaded context payload
-- `llm_query(prompt, model=None)`: For sub-LM calls
-- `llm_query_batched(prompts, model=None)`: For batched sub-LM calls
+- `llm_query(prompt, model=None)`: Plain single LM completion (no REPL, no iteration)
+- `llm_query_batched(prompts, model=None)`: Batched plain LM completions
+- `rlm_query(prompt, model=None)`: Recursive child RLM call (own REPL + iteration). Falls back to `llm_query` at max depth.
+- `rlm_query_batched(prompts, model=None)`: Batched recursive child RLM calls
 - `FINAL_VAR(variable_name)`: For returning final answers
+- `SHOW_VARS()`: For listing available variables
 
 ### Example Structure
 ```python
@@ -204,7 +207,8 @@ class MyEnvironment(NonIsolatedEnv):
 - Guidelines here are followed
 - Environment works with basic RLM completion calls
 - `cleanup()` properly releases all resources
-- Sub-LM calls work via `llm_query()`
+- Sub-LM calls work via `llm_query()` and `rlm_query()`
+- Reserved names (`llm_query`, `rlm_query`, `context`, `history`, `FINAL_VAR`, `SHOW_VARS`) are restored after each execution
 
 ## Architecture: Environment ↔ LM Handler Communication
 
@@ -223,7 +227,7 @@ Understanding how environments communicate with the LM Handler is essential for
 │        ▼                                            │               │
 │  ┌─────────────┐       Socket (TCP)                 │               │
 │  │ LocalREPL   │────────────────────────────────────┘               │
-│  │ (exec code) │  llm_query() → send_lm_request()                   │
+│  │ (exec code) │  llm_query() / rlm_query() → LM calls               │
 │  └─────────────┘                                                    │
 └─────────────────────────────────────────────────────────────────────┘
 ```
@@ -242,8 +246,8 @@ def socket_send(sock: socket.socket, data: dict) -> None:
 ```
 
 **Request Flow**:
-1. Environment's `llm_query(prompt)` is called during code execution
-2. Creates `LMRequest` dataclass and calls `send_lm_request(address, request)`
+1. Environment's `llm_query(prompt)` or `rlm_query(prompt)` is called during code execution
+2. For `llm_query`: creates `LMRequest` and calls `send_lm_request(address, request)`. For `rlm_query`: invokes `subcall_fn` to spawn a child RLM (or falls back to `llm_query` at max depth).
 3. Opens TCP connection to `LMHandler` at `(host, port)`
 4. Sends length-prefixed JSON request
 5. `LMHandler` processes via `LMRequestHandler.handle()`

diff --git a/README.md b/README.md
@@ -77,11 +77,11 @@ make quickstart
 </details>
 
 ## REPL Environments
-We support two types of REPL environments -- isolated, and non-isolated. Non-isolated environments (default) run code execution on the same machine as the RLM (e.g. through `exec`), which is pretty reasonable for some local low-risk tasks, like simple benchmarking, but can be problematic if the prompts or tool calls can interact with malicious users. Fully isolated environments used Cloud-based sandboxes (e.g. Prime Sandboxes, [Modal Sandboxes](https://modal.com/docs/guide/sandboxes)) to run code generated by the RLM, ensuring completely isolation from the host process. Environments can be added, but we natively support the following: `local` (default), `modal`, `prime`.
+We support two types of REPL environments -- isolated, and non-isolated. Non-isolated environments (default) run code execution on the same machine as the RLM (e.g. through `exec`), which is pretty reasonable for some local low-risk tasks, like simple benchmarking, but can be problematic if the prompts or tool calls can interact with malicious users. Fully isolated environments use cloud-based sandboxes (e.g. Prime Sandboxes, [Modal Sandboxes](https://modal.com/docs/guide/sandboxes)) to run code generated by the RLM, ensuring complete isolation from the host process. Environments can be added, but we natively support the following: `local` (default), `docker`, `modal`, `prime`, `daytona`, `e2b`.
 
 ```python
 rlm = RLM(
-    environment="...", # "local", "docker", "modal", "prime"
+    environment="...", # "local", "docker", "modal", "prime", "daytona", "e2b"
     environment_kwargs={...},
 )
 ```
@@ -124,19 +124,19 @@ We currently support most major clients (OpenAI, Anthropic), as well as the rout
 If you use this code or repository in your research, please cite:
 
 ```bibtex
-@misc{zhang2025recursivelanguagemodels,
-      title={Recursive Language Models}, 
+@misc{zhang2026recursivelanguagemodels,
+      title={Recursive Language Models},
       author={Alex L. Zhang and Tim Kraska and Omar Khattab},
-      year={2025},
+      year={2026},
       eprint={2512.24601},
       archivePrefix={arXiv},
       primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2512.24601}, 
+      url={https://arxiv.org/abs/2512.24601},
 }
 ```
 
 ## Optional: Trajectory metadata and logging
-`RLMChatCompletion` has an optional `metadata` field (default empty) that can hold the full trajectory (run config + all iterations and sub-calls) so you can reconstruct the run. Pass an `RLMLogger` to capture it:
+`RLMChatCompletion` has an optional `metadata` field (default `None`) that holds the full trajectory (run config + all iterations and sub-calls) so you can reconstruct the run. Pass an `RLMLogger` to capture it:
 
 - **In-memory only** (trajectory on `completion.metadata`): `logger=RLMLogger()` (no `log_dir`).
 - **Also save to disk** (JSONL for the visualizer): `logger=RLMLogger(log_dir="./logs")`.