diff --git a/docs/agentic_rl.md b/docs/agentic_rl.md
index 3a3b4a7ff..244e9d078 100644
--- a/docs/agentic_rl.md
+++ b/docs/agentic_rl.md
@@ -4,9 +4,7 @@
## Architecture
-
-
-
+
## Core Components
@@ -65,9 +63,7 @@ calls in parallel for efficiency.
### Agent/Environment interaction
-
-
-
+
--------------------------------------------------------------------------------
@@ -116,9 +112,7 @@ lock ensures that rollouts (`acquire_rollout`) are temporarily paused when a
weight sync (`acquire_weight_sync`) is requested, preventing agents from
generating trajectories with stale parameters.
-
-
-
+
### Trajectory Batching and Grouping
diff --git a/docs/design.md b/docs/design.md
index 770c695d4..0a1019ab9 100644
--- a/docs/design.md
+++ b/docs/design.md
@@ -123,9 +123,7 @@ training agents that can perform multi-turn reasoning and interact with external
tools. The design follows a standard RL paradigm where an **Agent** interacts
with an **Environment** over multiple steps to complete a task.
-
-
-
+
The core design supports agents that engage in **multi-turn conversations**,
breaking down complex problems into sequential steps of reasoning, tool