Skip to content

Make rlm subcalls parallel with thread safety and semaphore#136

Open
h4shk4t wants to merge 1 commit intoalexzhang13:mainfrom
h4shk4t:parallel-rlm-subcalls
Open

Make rlm subcalls parallel with thread safety and semaphore#136
h4shk4t wants to merge 1 commit intoalexzhang13:mainfrom
h4shk4t:parallel-rlm-subcalls

Conversation

@h4shk4t
Copy link

@h4shk4t h4shk4t commented Mar 11, 2026

Summary

Adds parallel execution support for rlm_query_batched, allowing child RLM subcalls to run concurrently instead of sequentially. This provides significant speedup when the parent model fans out multiple independent queries (e.g., answering 3 independent questions simultaneously).

Changes

rlm/core/rlm.py

  • New max_concurrent_subcalls parameter (default 1 = sequential, backward-compatible)
  • New event callbacks: on_subcall_start, on_subcall_complete, on_iteration_start, on_iteration_complete for live progress tracking (Optional, I added this for better debugging - can remove this if out of scope)
  • Thread-safe _subcall: added _state_lock to protect _cumulative_cost and error tracking when children run in parallel across threads
  • Child RLMs propagate concurrency settings and callbacks to their own children

rlm/environments/local_repl.py

  • New _rlm_query_batched_parallel: dispatches subcalls via a short-lived ThreadPoolExecutor, collects results in original prompt order via as_completed
  • Two-layer concurrency control:
  • Local: ThreadPoolExecutor(max_workers=max_concurrent_subcalls) bounds children per batch call
  • Global: process-wide threading.Semaphore(16) bounds total concurrent children across all depths, preventing thread/memory explosion with deep recursion
  • Thread-safe _pending_llm_calls appends via _calls_lock
  • Fixed _temp_cwd(): restores to self.original_cwd (set at init) instead of os.getcwd(), which could capture another thread's temp dir that later gets deleted ([Errno 2])

tests/test_rlm_query.py

TestRlmQueryBatchedParallel: 6 tests covering parallel speedup, result ordering, partial failure handling, pending call tracking, sequential fallback (max_concurrent=1), and single-prompt short-circuit
TestGlobalSemaphoreBounding: 2 tests verifying the global semaphore limits concurrent children and is shared across REPL instances

Design decisions

  • Sequential by default: max_concurrent_subcalls=1 has zero overhead — no locks, no thread pool, no semaphore. Existing behavior is unchanged.
  • No pool-within-pool deadlock: each depth level creates its own short-lived ThreadPoolExecutor. The global semaphore bounds work, not threads — workers acquire a slot, call subcall_fn, release.
  • No os.chdir lock: I explored Lock (deadlocks same-thread re-entry in sequential mode) and RLock (deadlocks cross-thread in parallel mode). The root cause was os.getcwd() capturing stale paths, fixed by always restoring to a stable directory.

Usage

from rlm import RLM

# Sequential (default) — identical to previous behavior
rlm = RLM(
    backend="azure_openai",
    backend_kwargs={"model_name": "gpt-4o-mini"},
    max_depth=2,
)
result = rlm.completion("Answer these 3 questions using rlm_query_batched...")
# Children run one at a time: total ~30s for 3 subcalls @ ~10s each

# Parallel — just set max_concurrent_subcalls
rlm = RLM(
    backend="azure_openai",
    backend_kwargs={"model_name": "gpt-4o-mini"},
    max_depth=2,
    max_concurrent_subcalls=4,  # <-- this is the only change
)
result = rlm.completion("Answer these 3 questions using rlm_query_batched...")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant