Skip to content

Making RLM subqueries run in parallel for faster execution #135

@h4shk4t

Description

@h4shk4t

Problem

When the parent RLM fans out multiple independent queries via rlm_query_batched (e.g., "answer these 3 questions"), each child RLM runs sequentially - the second child waits for the first to fully complete before starting. For N subcalls each taking T seconds, total wall time is N*T.

This is wasteful because the children are independent: they have separate prompts, separate REPL environments, and make separate LLM API calls. The bottleneck is I/O-bound (waiting for API responses), making this an ideal candidate for thread-based parallelism.

Proposed solution

For multiple subcalls (>1), use ThreadPoolExecutor and collect results of each subcall in the original order. Use concurrency control and semaphores to cap max number of parallel threads at each level and per recursion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions