Skip to content

Issues Report: Swarm & Parallel Execution #754

@luxunxiansheng

Description

@luxunxiansheng

AWorld Framework Issues Report: Swarm & Parallel Execution

This report documents several critical issues identified and resolved within the aworld framework during the debugging of the RDR Swarm Proof of Concept (PoC). These issues primarily affect the orchestration of multiple agents and state management in parallel execution environments.

1. Context Loss in Parallel Sub-Agents

  • File Path: aworld/agents/parallel_llm_agent.py
  • Issue: The async_policy method in ParallelizableAgent was not correctly passing the current task's context to the parallel workers it spawned.
  • Root Cause: It relied on self.context which might be uninitialized in the runner's lifecycle, instead of extracting the live context from the incoming Message object.
  • Impact: Sub-agents encountered a NoneType error when accessing context.agent_info. This prevented any parallel execution from succeeding.
  • Fix: Modified async_policy to prioritize message.context from kwargs.

2. Invalid 'finished' State Logic

  • File Path: aworld/agents/parallel_llm_agent.py
  • Issue: The finished() method was defined as a standard method but accessed as a property by the task runner.
  • Root Cause: Missing @property decorator.
  • Impact: Boolean checks like if agent.finished: were evaluating the method object itself (which is always truthy) rather than its return value. This caused agents to be incorrectly flagged as finished immediately.
  • Fix: Applied the @property decorator to the finished method.

3. Strict Handoff Validation (is_agent)

  • File Path: aworld/core/agent/base.py
  • Issue: The is_agent() utility function, used to determine if a message represents a transition between agents, had logic that failed for "completion" signals.
  • Root Cause: The function didn't correctly handle cases where tool_name was absent but the action represented an agent-to-agent handoff or a final step.
  • Impact: The runner would fail to route messages to successor agents in a Swarm graph if the transition didn't involve an explicit tool call.
  • Fix: Permitted transitions where tool_name is absent but the context implies an agent handoff.

4. Inconsistent Agent Result Serialization

  • File Path: aworld/agents/parallel_llm_agent.py
  • Issue: The _agent_result() method had a mismatch in its signature and async/sync handling compared to the base LLMAgent.
  • Root Cause: Architectural drift between the base agent class and the parallelizable extension.
  • Impact: Errors during the construction of the final Message payload after parallel workers finished their tasks.
  • Fix: Standardized _agent_result to be synchronous and correctly wrap parallel results into a singular ActionModel for the next agent in the chain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions