massgen · Henry-811 · Mar 13, 2026 · Mar 13, 2026 · coderabbitai · Mar 13, 2026
diff --git a/docs/modules/composition.md b/docs/modules/composition.md
@@ -131,6 +131,20 @@ Phase 2: Per-dimension parallel analysis (methodology personas per dimension)
 Phase 3: Cross-dimension synthesis (integration personas, synthesis eval criteria)
 ```
 
+### Pattern 5: Ensemble (Produce → Vote → Synthesize)
+
+For tasks where independent diversity matters more than iterative refinement. Each agent produces their best answer in isolation, then agents vote, and the winner synthesizes insights from all others.
+
+```
+Phase 1: Independent parallel production (disable_injection: true)
+Phase 2: Vote on best answer (defer_voting_until_all_answered: true)
+Phase 3: Winner synthesizes from all (final_answer_strategy: synthesize)
+```
+
+This is the default pattern for multi-agent subagent runs. It maximizes answer diversity by preventing agents from anchoring on each other's work, while still producing a high-quality synthesized final answer.
+
+See ``docs/source/reference/yaml_schema.rst`` and ``configs/voting/ensemble_mode.yaml`` for configuration details.
+
 ## Checklist Gate Criteria for Special Primitives
 
 The default checklist items (E1-E4) are designed for general task output. But special primitives — persona generation, task decomposition, evaluation criteria generation, and analytical tasks like prompt crafting or log analysis — have well-defined quality characteristics that don't require another level of prompt generation to specify.

diff --git a/docs/modules/coordination_workflow.md b/docs/modules/coordination_workflow.md
@@ -446,13 +446,19 @@ Checklist mode is policy, not the core coordination primitive:
   4. in that task-driven branch, the parent uses `get_task_plan` as the source of truth, may open the evaluator artifact paths for rationale, and does not call `submit_checklist` or `propose_improvements`
   5. in that task-driven branch, the parent implements, verifies, and submits via `new_answer` directly; for pure text artifacts, the final artifact body goes straight into `new_answer.content`
   6. if structured `next_tasks` are missing or invalid, the parent uses the returned critique/spec packet as the diagnostic basis for checklist submission
-  7. in that fallback branch, the parent saves or copies that packet into its workspace as the diagnostic report and calls `submit_checklist`
+  7. in that degraded fallback branch, the parent references the surfaced `critique_packet.md` path directly as `report_path` and calls `submit_checklist`
   8. if checklist returns `status=validation_error`, fix payload/report and call `submit_checklist` again
   9. if accepted iterate verdict, call `propose_improvements`
   10. implement plan (use `improvement_spec` from the evaluator packet as richer guidance when present)
-  11. write/update `memory/short_term/verification_latest.md` with replayable verification steps/artifacts
+  11. write/update `memory/short_term/verification_latest.md` using the replay contract sections
+      `Verification Contract`, `Inputs and Artifacts`, `Replay Steps`, `Latest Verification Result`,
+      and `Stale If`
   12. submit via `new_answer` (or terminal action)
 - round evaluator contract notes:
+  - support matrix:
+    - core path: orchestrator-managed stage -> synthesized evaluator packet -> valid `next_tasks.json` -> parent executes one task-driven next-round thesis
+    - degraded fallback: canonical packet exists but structured handoff is missing or invalid, so checklist submission uses `critique_packet.md` as the diagnostic basis
+    - advanced / non-default: branches such as `round_evaluator_refine` remain available for specific uses, but they are not the default story
   - returns a packet with `criteria_interpretation`, `criterion_findings`, `cross_answer_synthesis`, `preserve`, `improvement_spec`, `verification_plan`, and `evidence_gaps`
   - the packet is critique/spec guidance only, not a checklist payload or terminal recommendation
   - the inline `verdict_block` is intentionally minimal and carries verdict metadata (`verdict` + `scores`) rather than the full task handoff
@@ -468,6 +474,7 @@ Checklist mode is policy, not the core coordination primitive:
   - generated child YAML for `round_evaluator` always mounts the shared temp-workspace root read-only
   - `refine=false` keeps the evaluator child checklist-free; `refine=true` may inherit the parent checklist gate
   - checklist-enabled `round_evaluator` child runs use a dedicated default criteria preset for evaluator-packet quality when no child-specific criteria are configured
+  - `round_evaluator_transformation_pressure` biases how hard the evaluator searches for a larger thesis shift (`gentle`, `balanced`, `aggressive`) while still keeping correctness-first execution and one committed next-round thesis
 - checklist result contract:
   - accepted path: `status=accepted` + `verdict`
   - invalid path: `status=validation_error`, `requires_resubmission=true`, no `verdict`
@@ -569,9 +576,16 @@ orchestrator:
   checklist_first_answer: false
 ```
 
-## Self-Improvement and Evaluator Rescue Cycle
+## Round Evaluator Normal Path
 
-MassGen agents self-improve iteratively within each round, then submit their best answer. This cycle is by design — agents should push themselves to plateau before asking for external feedback. The round evaluator exists to rescue agents from plateaus they cannot break through alone.
+MassGen agents self-improve iteratively within each round, then submit their
+best answer. The round evaluator is the post-answer stage for **material self-improvement**
+that should either surface a materially better next-round thesis or declare
+local convergence for the current run.
+
+This is especially useful for open-ended self-improvement loops. The evaluator
+should keep helping the system find the next meaningful frontier of work rather
+than certifying the current answer after a few cosmetic chores.
 
 ### The plateau problem
 
@@ -648,6 +662,70 @@ Not every criterion can be driven to 10/10. Distinguish between two cases:
 
 The evaluator's `ceiling_status` signal lets the orchestrator make informed convergence decisions rather than burning rounds on diminishing returns.
 
+## Transformation Pressure
+
+`round_evaluator_transformation_pressure` is the evaluator-specific knob for
+how aggressively the managed stage should search for a larger thesis change
+before settling for incremental refinement.
+
+Current behavior:
+
+- `gentle`: exploit the current thesis longer; prefer corrective work unless
+  there is strong ceiling evidence.
+- `balanced`: the default middle ground; allow bigger restructuring once the
+  current line of improvement starts plateauing.
+- `aggressive`: treat open-ended tasks as frontier-seeking; push for
+  transformative shifts sooner when repeated rounds are not producing
+  step-change improvement.
+
+Machine-learning intuition that may be useful here:
+
+- **Exploration vs. exploitation**: `incremental_refinement` exploits the
+  current thesis; `thesis_shift` explores a different solution basin.
+- **Local minima escape**: repeated small gains can indicate that more polish
+  is not enough and that a different approach should be tried.
+- **Annealing / schedules**: effective transformation pressure could rise after
+  consecutive incremental rounds instead of remaining fixed.
+- **Validation signal**: evaluator verification evidence should ground the push
+  toward transformation so the system does not chase novelty for novelty's
+  sake.
+- **Catastrophic forgetting prevention**: the `preserve` list and final
+  preserve/regression verification protect strengths that should survive larger
+  pivots.
+- **Ensemble distillation**: multiple evaluator agents may surface several
+  promising directions, but the managed stage should still collapse them into
+  one committed next thesis for the parent to execute.
+
+This knob biases how strongly the evaluator searches for a higher frontier. It
+does not override correctness-first fixes, and it does not turn the handoff
+into an unresolved menu of incompatible strategies.
+
+## Ensemble Pattern
+
+The **ensemble pattern** is a coordination strategy that sits between iterative
+voting and decomposition. Agents produce answers independently (no peer
+visibility), then vote on the best, and the winner synthesizes insights from all
+others.
+
+Set it up with existing orchestrator parameters:
+
+```yaml
+orchestrator:
+  disable_injection: true
+  defer_voting_until_all_answered: true
+  max_new_answers_per_agent: 1
+  final_answer_strategy: "synthesize"
+```
+
+This is also the default pattern for multi-agent subagent runs (via
+`SubagentOrchestratorConfig` defaults: `disable_injection: true`,
+`defer_voting_until_all_answered: true`).
+
+When voting has occurred, the `synthesize` strategy gives the winner a
+winner-biased prompt ("your answer was selected as the best — use it as the
+primary basis and incorporate strongest elements from others"). Without voting,
+the prompt is neutral ("synthesize the strongest parts across all answers").
+
 ## Related Docs
 
 - `docs/modules/architecture.md` - core system architecture and backend hierarchy

diff --git a/docs/modules/subagents.md b/docs/modules/subagents.md
@@ -28,6 +28,7 @@ Behavior:
 - `inherited` runs subagents in the same runtime boundary as the parent
 - `delegated` uses the outbox pattern: the container writes a request file, a trusted host-side `SubagentLaunchWatcher` creates an isolated container per subagent (see [Delegated Mode](#delegated-mode-codex--docker) below)
 - Codex+Docker auto-detection: when the backend is Codex with `command_line_execution_mode: docker` and no explicit fallback/prefix is set, the orchestrator automatically switches to `delegated` mode if a delegation directory is available. This replaces the old `inherited` fallback with secure per-subagent container isolation.
+- Host-side Codex background MCP clients rewrite delegated subagent server configs back to `isolated` before connecting. That client already runs on the host, so it should launch the child run directly instead of pretending it is still inside the parent container.
 
 ## Delegated Mode (Codex + Docker)
 
@@ -184,30 +185,34 @@ Built-in profiles:
 ### Round Evaluator Loop
 
 `coordination.round_evaluator_before_checklist: true` enables the single-parent
-manual/prompt-guided v1 flow:
+orchestrator-managed round-evaluator stage:
 
 - round 1: parent builds and submits its first answer normally
-- round 2+: the parent launches one blocking `round_evaluator` subagent before
-  checklist submission unless the separate orchestrator-managed gate is enabled
+- round 2+: the orchestrator launches one blocking `round_evaluator` subagent
+  before checklist submission
 - the round evaluator returns a critique/spec packet with `criteria_interpretation`, `criterion_findings`, `cross_answer_synthesis`, `preserve`, `improvement_spec`, `verification_plan`, and `evidence_gaps`
-- the parent saves or copies that packet into its workspace as the diagnostic
-  report used for `submit_checklist`
+- core path: if valid `next_tasks.json` is present, the parent treats it as the
+  one committed next-round thesis, calls `get_task_plan`, implements, verifies,
+  and submits via `new_answer`
+- degraded fallback: if valid `next_tasks.json` is missing or invalid, the
+  parent uses `critique_packet.md` as the diagnostic basis for `submit_checklist`
 - the parent does not run a second full self-evaluation pass; additional
   verification is only for explicit `evidence_gaps`
 - the parent still owns `submit_checklist`, `propose_improvements`, `new_answer`, and `vote`
 - generated child YAML for `round_evaluator` always mounts the shared
   temp-workspace root read-only
-- with `refine: false`, `round_evaluator` remains a quick critique-only child
-  run and omits checklist-gated child settings
-- with `refine: true`, `round_evaluator` may inherit the parent checklist gate;
-  when no child-specific criteria are configured, it falls back to a built-in
-  `round_evaluator` criteria preset for judging the critique packet itself
+- if the evaluator child times out before producing `critique_packet.md`, the
+  orchestrator degrades back to the normal parent-owned checklist flow for that
+  answer set instead of terminating coordination immediately
+- `round_evaluator_refine` remains an advanced/non-default branch
 - when the child run is using presenter-stage `synthesize`/`winner_present`, it
   keeps `skip_final_presentation: false`
+- `round_evaluator_transformation_pressure` controls how aggressively the
+  evaluator seeks a larger thesis shift: `gentle`, `balanced`, or `aggressive`
 
-`coordination.orchestrator_managed_round_evaluator: true` is a separate,
-currently gated mode that lets the orchestrator launch that same blocking
-`round_evaluator` before round 2+.
+`coordination.orchestrator_managed_round_evaluator: true` remains required for
+this stage and keeps the launch reserved for the orchestrator rather than a
+manual parent prompt pattern.
 
 Validation constraints for this mode:
 

diff --git a/docs/source/reference/yaml_schema.rst b/docs/source/reference/yaml_schema.rst
@@ -1032,6 +1032,22 @@ Coordination Configuration
      - object
      - No
      - Background subagent configuration (``enabled``, ``injection_strategy``)
+   * - ``round_evaluator_before_checklist``
+     - boolean
+     - No
+     - Enable the orchestrator-managed round-evaluator stage before round-2+ checklist decisions (default: ``false``). Requires ``orchestrator_managed_round_evaluator: true`` and checklist-gated voting.
+   * - ``orchestrator_managed_round_evaluator``
+     - boolean
+     - No
+     - Treat the synthesized round-evaluator task handoff as the normal post-answer self-improvement path (default: ``false``).
+   * - ``round_evaluator_refine``
+     - boolean
+     - No
+     - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``).
+   * - ``round_evaluator_transformation_pressure``
-   * - ``round_evaluator_refine``
-     - boolean
-     - No
-     - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``).
-   * - ``round_evaluator_transformation_pressure``
+   * - ``round_evaluator_refine``
+     - boolean
+     - No
+     - Advanced/non-default option that lets the evaluator child iterate before producing its packet (default: ``false``).
+   * - ``round_evaluator_transformation_pressure``
-   * - ``round_evaluator_refine``
-     - boolean
-     - No
-     - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``).
-   * - ``round_evaluator_transformation_pressure``
+   * - ``round_evaluator_refine``
+     - boolean
+     - No
+     - Advanced/non-default option that lets the evaluator child iterate before producing its packet (default: ``false``).
+   * - ``round_evaluator_transformation_pressure``
+     - string
+     - No
+     - Bias on how aggressively the evaluator seeks a larger thesis change. Supported values: ``gentle``, ``balanced``, ``aggressive``. Default: ``balanced``.
 
 .. note::
 
@@ -1160,6 +1176,66 @@ Decomposition mode (recommended defaults):
      fairness_lead_cap_answers: 2
      max_midstream_injections_per_round: 2
 
+Ensemble pattern (recommended defaults):
+
+.. code-block:: yaml
+
+   orchestrator:
+     # Agents work independently — no peer answer injection
+     disable_injection: true
+     # Wait for all agents to finish before voting begins
+     defer_voting_until_all_answered: true
+     # Each agent produces 1 answer (adjustable)
+     max_new_answers_per_agent: 1
+     # Winner synthesizes from all answers
+     final_answer_strategy: "synthesize"
+
+The **ensemble pattern** is a coordination strategy where agents produce answers
+independently (no peer visibility), then vote on the best answer, and the winner
+synthesizes insights from all others into a refined final answer.
+
+**When to use ensemble mode:**
+
+- You want diverse, independent perspectives without agents anchoring on each
+  other's work
+- The task benefits from competitive parallel attempts rather than iterative
+  refinement (e.g., creative writing, design proposals, solution brainstorming)
+- You want faster coordination — single round of production + vote, no
+  multi-round iteration
+
+**Subagent default:** Multi-agent subagent runs use ensemble defaults
+automatically (``disable_injection: true``, ``defer_voting_until_all_answered:
+true``). Override by setting these fields explicitly in
+``subagent_orchestrator`` config.
-**Subagent default:** Multi-agent subagent runs use ensemble defaults
-automatically (``disable_injection: true``, ``defer_voting_until_all_answered:
-true``). Override by setting these fields explicitly in
-``subagent_orchestrator`` config.
+**Subagent default:** Multi-agent subagent runs use ensemble defaults
+automatically (``disable_injection: true``,
+``defer_voting_until_all_answered: true``). Override by setting these fields
+explicitly in ``subagent_orchestrator`` config.
-**Subagent default:** Multi-agent subagent runs use ensemble defaults
-automatically (``disable_injection: true``, ``defer_voting_until_all_answered:
-true``). Override by setting these fields explicitly in
-``subagent_orchestrator`` config.
+**Subagent default:** Multi-agent subagent runs use ensemble defaults
+automatically (``disable_injection: true``,
+``defer_voting_until_all_answered: true``). Override by setting these fields
+explicitly in ``subagent_orchestrator`` config.
+
+.. list-table:: Ensemble vs Standard Voting vs Decomposition
+   :header-rows: 1
+
+   * - Aspect
+     - Standard voting
+     - Ensemble pattern
+     - Decomposition
+   * - Peer visibility
+     - Agents see each other's answers
+     - Agents work in isolation
+     - Agents see subtask assignments
+   * - Iteration
+     - Multiple refinement rounds
+     - Single round of production
+     - Multiple rounds per subtask
+   * - Voting
+     - After iterative refinement
+     - After all answers produced
+     - No voting (presenter assembles)
+   * - Final answer
+     - Winner presents
+     - Winner synthesizes from all
+     - Presenter integrates subtasks
+   * - Best for
+     - Deep quality refinement
+     - Diverse perspectives, speed
+     - Complex multi-part tasks
+
 Timeout Configuration
 ~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/massgen/agent_config.py b/massgen/agent_config.py
@@ -212,6 +212,7 @@ class CoordinationConfig:
     orchestrator_managed_round_evaluator: bool = False  # Gate orchestrator-owned round_evaluator launch; default prompt-guidance only
     round_evaluator_skip_synthesis: bool = False  # Skip synthesis stage; pass all raw critiques to parent directly
     round_evaluator_refine: bool = False  # Allow evaluator agents to iterate (multi-round with voting)
+    round_evaluator_transformation_pressure: str = "balanced"  # "gentle" | "balanced" | "aggressive"
     enable_quality_rethink_on_iteration: bool = False  # Auto-inject quality_rethinking spawn task on iteration 2+
     enable_novelty_on_iteration: bool = False  # Auto-inject novelty/quality spawn task on iteration 2+
     novelty_injection: str = "none"  # "none" | "gentle" | "moderate" | "aggressive"
@@ -227,6 +228,7 @@ def __post_init__(self):
         self._validate_subagent_runtime_config()
         self._validate_drift_conflict_policy()
         self._validate_novelty_injection()
+        self._validate_round_evaluator_transformation_pressure()
         self._validate_learning_capture_mode()
         self._validate_pre_collab_voting_threshold()
         self._validate_improvements()
@@ -294,6 +296,14 @@ def _validate_novelty_injection(self):
                 f"Invalid novelty_injection: '{self.novelty_injection}'. " f"Must be one of: {sorted(valid_values)}",
             )
 
+    def _validate_round_evaluator_transformation_pressure(self):
+        """Validate round_evaluator_transformation_pressure setting."""
+        valid_values = {"gentle", "balanced", "aggressive"}
+        if self.round_evaluator_transformation_pressure not in valid_values:
+            raise ValueError(
+                "Invalid round_evaluator_transformation_pressure: " f"'{self.round_evaluator_transformation_pressure}'. " f"Must be one of: {sorted(valid_values)}",
+            )
+
     def _validate_learning_capture_mode(self):
         """Validate learning_capture_mode setting."""
         valid_values = {"round", "verification_and_final_only", "final_only"}
@@ -1151,6 +1161,7 @@ def to_dict(self) -> dict[str, Any]:
             "planning_mode_instruction": self.coordination_config.planning_mode_instruction,
             "max_orchestration_restarts": self.coordination_config.max_orchestration_restarts,
             "drift_conflict_policy": self.coordination_config.drift_conflict_policy,
+            "round_evaluator_transformation_pressure": self.coordination_config.round_evaluator_transformation_pressure,
         }
 
         # Handle debug fields

diff --git a/massgen/api_params_handler/_api_params_handler_base.py b/massgen/api_params_handler/_api_params_handler_base.py
@@ -115,6 +115,7 @@ def get_base_excluded_params(self) -> set[str]:
             "orchestrator_managed_round_evaluator",  # Gate for orchestrator-owned round_evaluator launch
             "round_evaluator_skip_synthesis",  # Skip synthesis; pass raw critiques to parent directly
             "round_evaluator_refine",  # Allow evaluator agents to iterate (multi-round with voting)
+            "round_evaluator_transformation_pressure",  # Coordination-only bias for evaluator thesis boldness
             "enable_quality_rethink_on_iteration",  # Coordination-only quality task injection toggle
             "enable_novelty_on_iteration",  # Coordination-only novelty task injection toggle
             "novelty_injection",  # Novelty pressure level (none/gentle/moderate/aggressive)

diff --git a/massgen/backend/base.py b/massgen/backend/base.py
@@ -343,6 +343,7 @@ def get_base_excluded_config_params(cls) -> set:
             "orchestrator_managed_round_evaluator",  # Gate for orchestrator-owned round_evaluator launch
             "round_evaluator_skip_synthesis",  # Skip synthesis; pass raw critiques to parent directly
             "round_evaluator_refine",  # Allow evaluator agents to iterate (multi-round with voting)
+            "round_evaluator_transformation_pressure",  # Coordination-only bias for evaluator thesis boldness
             "enable_quality_rethink_on_iteration",  # Coordination-only quality task injection toggle
             "enable_novelty_on_iteration",  # Coordination-only novelty task injection toggle
             "novelty_injection",  # Novelty pressure level (none/gentle/moderate/aggressive)