diff --git a/docs/agentic_rl.md b/docs/agentic_rl.md index 3a3b4a7ff..244e9d078 100644 --- a/docs/agentic_rl.md +++ b/docs/agentic_rl.md @@ -4,9 +4,7 @@ ## Architecture -

- Trajectory Collect Engine Overview -

+![Trajectory Collect Engine Overview](images/agentic_rollout_pipeline.png) ## Core Components @@ -65,9 +63,7 @@ calls in parallel for efficiency. ### Agent/Environment interaction -

- Batch vs Async Rollout -

+![Batch vs Async Rollout](images/agentic_agent:env.png) -------------------------------------------------------------------------------- @@ -116,9 +112,7 @@ lock ensures that rollouts (`acquire_rollout`) are temporarily paused when a weight sync (`acquire_weight_sync`) is requested, preventing agents from generating trajectories with stale parameters. -

- Batch vs Async Rollout -

+![Batch vs Async Rollout](images/batch_vs_async_rollout.png) ### Trajectory Batching and Grouping diff --git a/docs/design.md b/docs/design.md index 770c695d4..0a1019ab9 100644 --- a/docs/design.md +++ b/docs/design.md @@ -123,9 +123,7 @@ training agents that can perform multi-turn reasoning and interact with external tools. The design follows a standard RL paradigm where an **Agent** interacts with an **Environment** over multiple steps to complete a task. -

- Agentic RL Flow -

+![Agentic RL Flow](images/agentic_rollout_pipeline.png) The core design supports agents that engage in **multi-turn conversations**, breaking down complex problems into sequential steps of reasoning, tool