-
Notifications
You must be signed in to change notification settings - Fork 568
Description
Problem
When the parent RLM fans out multiple independent queries via rlm_query_batched (e.g., "answer these 3 questions"), each child RLM runs sequentially - the second child waits for the first to fully complete before starting. For N subcalls each taking T seconds, total wall time is N*T.
This is wasteful because the children are independent: they have separate prompts, separate REPL environments, and make separate LLM API calls. The bottleneck is I/O-bound (waiting for API responses), making this an ideal candidate for thread-based parallelism.
Proposed solution
For multiple subcalls (>1), use ThreadPoolExecutor and collect results of each subcall in the original order. Use concurrency control and semaphores to cap max number of parallel threads at each level and per recursion.