-
Notifications
You must be signed in to change notification settings - Fork 134
feat: Better subagent contracts; lighter refinement for subagents too #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ Behavior: | |
| - `inherited` runs subagents in the same runtime boundary as the parent | ||
| - `delegated` uses the outbox pattern: the container writes a request file, a trusted host-side `SubagentLaunchWatcher` creates an isolated container per subagent (see [Delegated Mode](#delegated-mode-codex--docker) below) | ||
| - Codex+Docker auto-detection: when the backend is Codex with `command_line_execution_mode: docker` and no explicit fallback/prefix is set, the orchestrator automatically switches to `delegated` mode if a delegation directory is available. This replaces the old `inherited` fallback with secure per-subagent container isolation. | ||
| - Host-side Codex background MCP clients rewrite delegated subagent server configs back to `isolated` before connecting. That client already runs on the host, so it should launch the child run directly instead of pretending it is still inside the parent container. | ||
|
|
||
| ## Delegated Mode (Codex + Docker) | ||
|
|
||
|
|
@@ -184,30 +185,34 @@ Built-in profiles: | |
| ### Round Evaluator Loop | ||
|
|
||
| `coordination.round_evaluator_before_checklist: true` enables the single-parent | ||
| manual/prompt-guided v1 flow: | ||
| orchestrator-managed round-evaluator stage: | ||
|
|
||
| - round 1: parent builds and submits its first answer normally | ||
| - round 2+: the parent launches one blocking `round_evaluator` subagent before | ||
| checklist submission unless the separate orchestrator-managed gate is enabled | ||
| - round 2+: the orchestrator launches one blocking `round_evaluator` subagent | ||
| before checklist submission | ||
| - the round evaluator returns a critique/spec packet with `criteria_interpretation`, `criterion_findings`, `cross_answer_synthesis`, `preserve`, `improvement_spec`, `verification_plan`, and `evidence_gaps` | ||
| - the parent saves or copies that packet into its workspace as the diagnostic | ||
| report used for `submit_checklist` | ||
| - core path: if valid `next_tasks.json` is present, the parent treats it as the | ||
| one committed next-round thesis, calls `get_task_plan`, implements, verifies, | ||
| and submits via `new_answer` | ||
| - degraded fallback: if valid `next_tasks.json` is missing or invalid, the | ||
| parent uses `critique_packet.md` as the diagnostic basis for `submit_checklist` | ||
| - the parent does not run a second full self-evaluation pass; additional | ||
| verification is only for explicit `evidence_gaps` | ||
| - the parent still owns `submit_checklist`, `propose_improvements`, `new_answer`, and `vote` | ||
| - generated child YAML for `round_evaluator` always mounts the shared | ||
| temp-workspace root read-only | ||
| - with `refine: false`, `round_evaluator` remains a quick critique-only child | ||
| run and omits checklist-gated child settings | ||
| - with `refine: true`, `round_evaluator` may inherit the parent checklist gate; | ||
| when no child-specific criteria are configured, it falls back to a built-in | ||
| `round_evaluator` criteria preset for judging the critique packet itself | ||
| - if the evaluator child times out before producing `critique_packet.md`, the | ||
| orchestrator degrades back to the normal parent-owned checklist flow for that | ||
| answer set instead of terminating coordination immediately | ||
| - `round_evaluator_refine` remains an advanced/non-default branch | ||
| - when the child run is using presenter-stage `synthesize`/`winner_present`, it | ||
| keeps `skip_final_presentation: false` | ||
| - `round_evaluator_transformation_pressure` controls how aggressively the | ||
| evaluator seeks a larger thesis shift: `gentle`, `balanced`, or `aggressive` | ||
|
|
||
| `coordination.orchestrator_managed_round_evaluator: true` is a separate, | ||
| currently gated mode that lets the orchestrator launch that same blocking | ||
| `round_evaluator` before round 2+. | ||
| `coordination.orchestrator_managed_round_evaluator: true` remains required for | ||
| this stage and keeps the launch reserved for the orchestrator rather than a | ||
| manual parent prompt pattern. | ||
|
Comment on lines
+188
to
+215
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make the dual-flag prerequisite explicit at the top of this section. This now reads as if Based on learnings, module documentation in 🤖 Prompt for AI Agents |
||
|
|
||
| Validation constraints for this mode: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1032,6 +1032,22 @@ Coordination Configuration | |||||||||||||||||||||
| - object | ||||||||||||||||||||||
| - No | ||||||||||||||||||||||
| - Background subagent configuration (``enabled``, ``injection_strategy``) | ||||||||||||||||||||||
| * - ``round_evaluator_before_checklist`` | ||||||||||||||||||||||
| - boolean | ||||||||||||||||||||||
| - No | ||||||||||||||||||||||
| - Enable the orchestrator-managed round-evaluator stage before round-2+ checklist decisions (default: ``false``). Requires ``orchestrator_managed_round_evaluator: true`` and checklist-gated voting. | ||||||||||||||||||||||
| * - ``orchestrator_managed_round_evaluator`` | ||||||||||||||||||||||
| - boolean | ||||||||||||||||||||||
| - No | ||||||||||||||||||||||
| - Treat the synthesized round-evaluator task handoff as the normal post-answer self-improvement path (default: ``false``). | ||||||||||||||||||||||
| * - ``round_evaluator_refine`` | ||||||||||||||||||||||
| - boolean | ||||||||||||||||||||||
| - No | ||||||||||||||||||||||
| - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``). | ||||||||||||||||||||||
| * - ``round_evaluator_transformation_pressure`` | ||||||||||||||||||||||
|
Comment on lines
+1043
to
+1047
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix wording in The phrase “run iterate” reads as a typo and is confusing in the schema docs. ✏️ Suggested wording fix- - Advanced/non-default option that lets the evaluator child run iterate before producing its packet (default: ``false``).
+ - Advanced/non-default option that lets the evaluator child iterate before producing its packet (default: ``false``).📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||
| - string | ||||||||||||||||||||||
| - No | ||||||||||||||||||||||
| - Bias on how aggressively the evaluator seeks a larger thesis change. Supported values: ``gentle``, ``balanced``, ``aggressive``. Default: ``balanced``. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| .. note:: | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
@@ -1160,6 +1176,66 @@ Decomposition mode (recommended defaults): | |||||||||||||||||||||
| fairness_lead_cap_answers: 2 | ||||||||||||||||||||||
| max_midstream_injections_per_round: 2 | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Ensemble pattern (recommended defaults): | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| .. code-block:: yaml | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| orchestrator: | ||||||||||||||||||||||
| # Agents work independently — no peer answer injection | ||||||||||||||||||||||
| disable_injection: true | ||||||||||||||||||||||
| # Wait for all agents to finish before voting begins | ||||||||||||||||||||||
| defer_voting_until_all_answered: true | ||||||||||||||||||||||
| # Each agent produces 1 answer (adjustable) | ||||||||||||||||||||||
| max_new_answers_per_agent: 1 | ||||||||||||||||||||||
| # Winner synthesizes from all answers | ||||||||||||||||||||||
| final_answer_strategy: "synthesize" | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| The **ensemble pattern** is a coordination strategy where agents produce answers | ||||||||||||||||||||||
| independently (no peer visibility), then vote on the best answer, and the winner | ||||||||||||||||||||||
| synthesizes insights from all others into a refined final answer. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **When to use ensemble mode:** | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| - You want diverse, independent perspectives without agents anchoring on each | ||||||||||||||||||||||
| other's work | ||||||||||||||||||||||
| - The task benefits from competitive parallel attempts rather than iterative | ||||||||||||||||||||||
| refinement (e.g., creative writing, design proposals, solution brainstorming) | ||||||||||||||||||||||
| - You want faster coordination — single round of production + vote, no | ||||||||||||||||||||||
| multi-round iteration | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Subagent default:** Multi-agent subagent runs use ensemble defaults | ||||||||||||||||||||||
| automatically (``disable_injection: true``, ``defer_voting_until_all_answered: | ||||||||||||||||||||||
| true``). Override by setting these fields explicitly in | ||||||||||||||||||||||
| ``subagent_orchestrator`` config. | ||||||||||||||||||||||
|
Comment on lines
+1206
to
+1209
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid splitting inline literals across lines in RST. The inline literal around 🧩 Suggested RST-safe formatting-**Subagent default:** Multi-agent subagent runs use ensemble defaults
-automatically (``disable_injection: true``, ``defer_voting_until_all_answered:
-true``). Override by setting these fields explicitly in
-``subagent_orchestrator`` config.
+**Subagent default:** Multi-agent subagent runs use ensemble defaults
+automatically (``disable_injection: true``,
+``defer_voting_until_all_answered: true``). Override by setting these fields
+explicitly in ``subagent_orchestrator`` config.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| .. list-table:: Ensemble vs Standard Voting vs Decomposition | ||||||||||||||||||||||
| :header-rows: 1 | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| * - Aspect | ||||||||||||||||||||||
| - Standard voting | ||||||||||||||||||||||
| - Ensemble pattern | ||||||||||||||||||||||
| - Decomposition | ||||||||||||||||||||||
| * - Peer visibility | ||||||||||||||||||||||
| - Agents see each other's answers | ||||||||||||||||||||||
| - Agents work in isolation | ||||||||||||||||||||||
| - Agents see subtask assignments | ||||||||||||||||||||||
| * - Iteration | ||||||||||||||||||||||
| - Multiple refinement rounds | ||||||||||||||||||||||
| - Single round of production | ||||||||||||||||||||||
| - Multiple rounds per subtask | ||||||||||||||||||||||
| * - Voting | ||||||||||||||||||||||
| - After iterative refinement | ||||||||||||||||||||||
| - After all answers produced | ||||||||||||||||||||||
| - No voting (presenter assembles) | ||||||||||||||||||||||
| * - Final answer | ||||||||||||||||||||||
| - Winner presents | ||||||||||||||||||||||
| - Winner synthesizes from all | ||||||||||||||||||||||
| - Presenter integrates subtasks | ||||||||||||||||||||||
| * - Best for | ||||||||||||||||||||||
| - Deep quality refinement | ||||||||||||||||||||||
| - Diverse perspectives, speed | ||||||||||||||||||||||
| - Complex multi-part tasks | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
Comment on lines
+1179
to
+1238
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Verify Recent Releases section exists and contains feature keywords.
rg -n "Recent Releases|round_evaluator_transformation_pressure|Ensemble" docs/source/index.rst docs/source/reference/yaml_schema.rstRepository: massgen/MassGen Length of output: 449 🏁 Script executed: cat -n docs/source/index.rst | sed -n '195,250p'Repository: massgen/MassGen Length of output: 2605 Update The ensemble pattern coordination strategy with 🤖 Prompt for AI Agents |
||||||||||||||||||||||
| Timeout Configuration | ||||||||||||||||||||||
| ~~~~~~~~~~~~~~~~~~~~~ | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 205
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 333
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 3065
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 50374
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 2438
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 314
🏁 Script executed:
sed -n '130,160p' docs/modules/composition.mdRepository: massgen/MassGen
Length of output: 2020
🏁 Script executed:
Repository: massgen/MassGen
Length of output: 201
The referenced config file
ensemble_mode.yamldoes not exist.massgen/configs/voting/ensemble_mode.yamlis not in the repository. Thevotingdirectory contains onlygemini_gpt_voting_sensitivity.yaml. Either create this example config file or correct the documentation reference to point to an existing config.Additionally, the path is missing the
massgen/prefix—should bemassgen/configs/voting/notconfigs/voting/.🤖 Prompt for AI Agents