massgen
diff --git a/‎CHANGELOG.md‎
Lines changed: 24 additions & 3 deletions b/‎CHANGELOG.md‎
Lines changed: 24 additions & 3 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 4 additions & 4 deletions b/‎CONTRIBUTING.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 30 additions & 35 deletions b/‎README.md‎
Lines changed: 30 additions & 35 deletions
diff --git a/‎README_PYPI.md‎
Lines changed: 30 additions & 35 deletions b/‎README_PYPI.md‎
Lines changed: 30 additions & 35 deletions
@@ -9,6 +9,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## Recent Releases
 
+**v0.1.61 (March 9, 2026)** - Round Evaluator Paradigm
+New round evaluator subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment. Major orchestrator refactoring with improved evaluation prompts, task plan injection, and subagent fixes.
+
 **v0.1.60 (March 6, 2026)** - Multimodal Tools, Subagent Enhancements & GPT-5.4
 Rewritten read_media with clearer schema and MediaCallLedgerHook for media call tracking. Subagent enhancements: inherit_spawning_agent_backend, final_answer_strategy, per-agent subagent_agents. GPT-5.4 as default OpenAI flagship. Decomp mode cooperates with checklist workflow. Codex prompt caching calculation fix for pricing accuracy.
 
@@ -18,11 +21,29 @@ Planning improvements with auto-added improvements to task plan and plan review
 **v0.1.58 (March 2, 2026)** - Multimodal Revamp, Nvidia NIM Backend & Quality Rethinking
 Comprehensive multimodal revamp with ElevenLabs TTS/STT, Nano Banana 2 image generation, and Grok multimedia. Nvidia NIM backend for NVIDIA Inference Microservices. Quality rethinking subagent for per-element craft improvements. Smarter checklists with improve/preserve listings. Logging architecture refactor and CLI mode flags.
 
-**v0.1.57 (February 27, 2026)** - Delegated Subagent Protocol & Builder Subagent
-File-based delegation protocol for container-to-host subagent spawning. New builder subagent type for large artifact generation with fresh context. Claude Code reasoning parameters for updated SDK. Smarter convergence with substantiveness tracking and diagnostic report gating.
-
 ---
 
+## [0.1.61] - 2026-03-09
+
+### Added
+- **Round Evaluator Subagent Type** ([#986](https://github.com/massgen/MassGen/pull/986)): New `round_evaluator` subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment
+- **`round_evaluator_example.yaml` Config** ([#986](https://github.com/massgen/MassGen/pull/986)): New example config for the round evaluator paradigm
+
+### Changed
+- **Orchestrator Refactoring** ([#986](https://github.com/massgen/MassGen/pull/986)): Major orchestrator refactoring (+1,189 lines) to support the round evaluation workflow
+- **Evaluation Prompts** ([#986](https://github.com/massgen/MassGen/pull/986)): Improved evaluation prompts for clearer, more actionable feedback with task plan injection
+- **Simplified Config** ([#986](https://github.com/massgen/MassGen/pull/986)): Simplified config handling for evaluation parameters
+- **SUBAGENT.md Generality** ([#986](https://github.com/massgen/MassGen/pull/986)): Improved SUBAGENT.md for broader subagent compatibility
+
+### Fixed
+- **Session Resumption** ([#986](https://github.com/massgen/MassGen/pull/986)): Fixed resumption from already-resumed logs
+- **Round Evaluation Prompts** ([#986](https://github.com/massgen/MassGen/pull/986)): Improved round evaluation prompt clarity
+
+### Technical Details
+- **Major Focus**: Round evaluator paradigm — delegated evaluation to specialized subagents
+- **PRs Merged**: [#986](https://github.com/massgen/MassGen/pull/986) (improve_verification_time)
+- **Contributors**: @ncrispino (8 commits), @HenryQi (1 commit)
+
 ## [0.1.60] - 2026-03-06
 
 ### Added
 
@@ -359,7 +359,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.
 
 ## 🔧 Development Workflow
 
-> **Important**: Our next version is v0.1.61. If you want to contribute, please contribute to the `dev/v0.1.61` branch (or `main` if dev/v0.1.61 doesn't exist yet).
+> **Important**: Our next version is v0.1.62. If you want to contribute, please contribute to the `dev/v0.1.62` branch (or `main` if dev/v0.1.62 doesn't exist yet).
 
 ### 1. Create Feature Branch
 
@@ -368,7 +368,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.
 git fetch upstream
 
 # Create feature branch from dev/v0.1.60 (or main if dev branch doesn't exist yet)
-git checkout -b feature/your-feature-name upstream/dev/v0.1.61
+git checkout -b feature/your-feature-name upstream/dev/v0.1.62
 ```
 
 ### 2. Make Your Changes
@@ -507,7 +507,7 @@ git push origin feature/your-feature-name
 ```
 
 Then create a pull request on GitHub:
-- Base branch: `dev/v0.1.61` (or `main` if dev branch doesn't exist yet)
+- Base branch: `dev/v0.1.62` (or `main` if dev branch doesn't exist yet)
 - Compare branch: `feature/your-feature-name`
 - Add clear description of changes
 - Link any related issues
@@ -617,7 +617,7 @@ Have a significant feature idea not covered by existing tracks?
 - [ ] Tests pass locally
 - [ ] Documentation is updated if needed
 - [ ] Commit messages follow convention
-- [ ] PR targets `dev/v0.1.61` branch (or `main` if dev branch doesn't exist yet)
+- [ ] PR targets `dev/v0.1.62` branch (or `main` if dev branch doesn't exist yet)
 
 ### PR Description Should Include
 
 
@@ -69,7 +69,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🆕 Latest Features</h3></summary>
 
-- [v0.1.59 Features](#-latest-features-v0159)
+- [v0.1.61 Features](#-latest-features-v0161)
 </details>
 
 <details open>
@@ -122,15 +122,15 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🗺️ Roadmap</h3></summary>
 
-- [Recent Achievements (v0.1.59)](#recent-achievements-v0159)
-- [Previous Achievements (v0.0.3 - v0.1.58)](#previous-achievements-v003---v0158)
+- [Recent Achievements (v0.1.61)](#recent-achievements-v0161)
+- [Previous Achievements (v0.0.3 - v0.1.60)](#previous-achievements-v003---v0160)
 - [Key Future Enhancements](#key-future-enhancements)
   - Bug Fixes & Backend Improvements
   - Advanced Agent Collaboration
   - Expanded Model, Tool & Agent Integrations
   - Improved Performance & Scalability
   - Enhanced Developer Experience
-- [v0.1.60 Roadmap](#v0160-roadmap)
+- [v0.1.62 Roadmap](#v0162-roadmap)
 </details>
 
 <details open>
@@ -155,23 +155,22 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 ---
 
-## 🆕 Latest Features (v0.1.60)
+## 🆕 Latest Features (v0.1.61)
 
-**🎉 Released: March 6, 2026**
+**🎉 Released: March 9, 2026**
 
-**What's New in v0.1.60:**
-- **🛠️ Multimodal Tool Improvements** - Rewritten `read_media` with clearer schema and `MediaCallLedgerHook` for tracking media calls.
-- **🤖 Subagent Enhancements** - `inherit_spawning_agent_backend` for automatic backend inheritance, `final_answer_strategy` for child orchestrator policy, per-agent `subagent_agents` override.
-- **🧠 GPT-5.4** - New default OpenAI flagship model across all coordination modes.
-- **🔄 Decomp + Checklist Cooperation** - Decomp mode works with checklist workflow for quality-gated subtask iteration.
+**What's New in v0.1.61:**
+- **🔄 Round Evaluator Paradigm** - New `round_evaluator` subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment.
+- **📝 Evaluation Improvements** - Improved evaluation prompts with task plan injection for context-aware assessment.
+- **🔧 Orchestrator Refactoring** - Major orchestrator refactoring (+1,189 lines) to support the round evaluation workflow.
 
-**Try v0.1.60 Features:**
+**Try v0.1.61 Features:**
 ```bash
 # Install or upgrade
 pip install --upgrade massgen
 
-# Choose backend 'openai' with model 'gpt-5.4' in the setup wizard to start using GPT-5.4
-uv run massgen --quickstart
+# Try the round evaluator paradigm
+uv run massgen --config @examples/features/round_evaluator_example.yaml "Create a website for an AI startup with polished visuals and interactive elements"
 ```
 
 → [See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1233,31 +1232,27 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
 
-### Recent Achievements (v0.1.60)
-
-**🎉 Released: March 6, 2026**
+### Recent Achievements (v0.1.61)
 
-#### Multimodal Tools
-- **Rewritten `read_media` Tool** ([#978](https://github.com/massgen/MassGen/pull/978)): Clearer schema, better error handling, and improved naming
-- **`MediaCallLedgerHook`**: New hook for tracking `read_media` and `generate_media` tool calls
+**🎉 Released: March 9, 2026**
 
-#### Subagent Enhancements
-- **`inherit_spawning_agent_backend`** ([#978](https://github.com/massgen/MassGen/pull/978)): Subagents automatically inherit the spawning agent's backend configuration
-- **`final_answer_strategy`**: Configurable child orchestrator final-answer policy (winner_reuse, winner_present, synthesize)
-- **Per-Agent `subagent_agents`**: Per-agent override for subagent agent configs; orchestrator config file support with robust JSON parsing
+#### Round Evaluator Paradigm
+- **Round Evaluator Subagent Type** ([#986](https://github.com/massgen/MassGen/pull/986)): New `round_evaluator` subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment
+- **Orchestrator Refactoring**: Major orchestrator refactoring (+1,189 lines) to support the round evaluation workflow
+- **New Config**: `round_evaluator_example.yaml` for easy adoption
 
-#### Model & Coordination
-- **GPT-5.4 Support** ([#978](https://github.com/massgen/MassGen/pull/978)): New default OpenAI flagship model added to the model registry
-- **Decomp + Checklist Cooperation**: Decomposition mode works with the checklist workflow for quality-gated subtask iteration
-- **Improved Verification Round Time**: Better `verification_latest` prompts for faster verification rounds
+#### Evaluation Improvements
+- **Improved Evaluation Prompts** ([#986](https://github.com/massgen/MassGen/pull/986)): Clearer, more actionable feedback with task plan injection
+- **Simplified Config**: Simplified config handling for evaluation parameters
+- **SUBAGENT.md Generality**: Improved SUBAGENT.md for broader subagent compatibility
 
 #### Fixes
-- **Checklist & Proposal Injections**: More reliable checklist behavior with improved proposal injection
-- **Codex Prompt Caching**: Fixed prompt caching calculation for pricing accuracy
-- **Task Plan Refresh**: Fixed task plan refresh during quality rounds
-- **Skill Prefix Handling**: Fixed edge cases in skill prefix resolution
+- **Session Resumption** ([#986](https://github.com/massgen/MassGen/pull/986)): Fixed resumption from already-resumed logs
+- **Round Evaluation Prompts**: Improved round evaluation prompt clarity
+
+### Previous Achievements (v0.0.3 - v0.1.60)
 
-### Previous Achievements (v0.0.3 - v0.1.59)
+✅ **Multimodal Tools, Subagent Enhancements & GPT-5.4 (v0.1.60)**: Rewritten read_media with clearer schema and MediaCallLedgerHook. Subagent enhancements with inherit_spawning_agent_backend, final_answer_strategy, per-agent subagent_agents. GPT-5.4 as default OpenAI flagship. Decomp mode cooperates with checklist workflow. Codex prompt caching fix.
 
 ✅ **Quality Round Improvements (v0.1.59)**: Auto-add improvements to task plan, plan review enhancements. Better eval gen config, checklist fixes, Gemini tool name normalization for MCP. Subagent behavior adjustments, Docker skill write access fixes. Video gen skill adjustments and impact metric restoration.
 
@@ -1522,9 +1517,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 We welcome community contributions to achieve these goals.
 
-### v0.1.60 Roadmap
+### v0.1.62 Roadmap
 
-Version 0.1.60 focuses on improving skill use and exploration:
+Version 0.1.62 focuses on improving skill use and exploration:
 
 #### Planned Features
 - **Improve Skill Use and Exploration** ([#873](https://github.com/massgen/MassGen/issues/873)): Local skill execution, skill registry with hierarchical organization, and skill consolidation workflow
 
@@ -68,7 +68,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🆕 Latest Features</h3></summary>
 
-- [v0.1.59 Features](#-latest-features-v0159)
+- [v0.1.61 Features](#-latest-features-v0161)
 </details>
 
 <details open>
@@ -121,15 +121,15 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🗺️ Roadmap</h3></summary>
 
-- [Recent Achievements (v0.1.59)](#recent-achievements-v0159)
-- [Previous Achievements (v0.0.3 - v0.1.58)](#previous-achievements-v003---v0158)
+- [Recent Achievements (v0.1.61)](#recent-achievements-v0161)
+- [Previous Achievements (v0.0.3 - v0.1.60)](#previous-achievements-v003---v0160)
 - [Key Future Enhancements](#key-future-enhancements)
   - Bug Fixes & Backend Improvements
   - Advanced Agent Collaboration
   - Expanded Model, Tool & Agent Integrations
   - Improved Performance & Scalability
   - Enhanced Developer Experience
-- [v0.1.60 Roadmap](#v0160-roadmap)
+- [v0.1.62 Roadmap](#v0162-roadmap)
 </details>
 
 <details open>
@@ -154,23 +154,22 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 ---
 
-## 🆕 Latest Features (v0.1.60)
+## 🆕 Latest Features (v0.1.61)
 
-**🎉 Released: March 6, 2026**
+**🎉 Released: March 9, 2026**
 
-**What's New in v0.1.60:**
-- **🛠️ Multimodal Tool Improvements** - Rewritten `read_media` with clearer schema and `MediaCallLedgerHook` for tracking media calls.
-- **🤖 Subagent Enhancements** - `inherit_spawning_agent_backend` for automatic backend inheritance, `final_answer_strategy` for child orchestrator policy, per-agent `subagent_agents` override.
-- **🧠 GPT-5.4** - New default OpenAI flagship model across all coordination modes.
-- **🔄 Decomp + Checklist Cooperation** - Decomp mode works with checklist workflow for quality-gated subtask iteration.
+**What's New in v0.1.61:**
+- **🔄 Round Evaluator Paradigm** - New `round_evaluator` subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment.
+- **📝 Evaluation Improvements** - Improved evaluation prompts with task plan injection for context-aware assessment.
+- **🔧 Orchestrator Refactoring** - Major orchestrator refactoring (+1,189 lines) to support the round evaluation workflow.
 
-**Try v0.1.60 Features:**
+**Try v0.1.61 Features:**
 ```bash
 # Install or upgrade
 pip install --upgrade massgen
 
-# Choose backend 'openai' with model 'gpt-5.4' in the setup wizard to start using GPT-5.4
-uv run massgen --quickstart
+# Try the round evaluator paradigm
+uv run massgen --config @examples/features/round_evaluator_example.yaml "Create a website for an AI startup with polished visuals and interactive elements"
 ```
 
 → [See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1232,31 +1231,27 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
 
-### Recent Achievements (v0.1.60)
-
-**🎉 Released: March 6, 2026**
+### Recent Achievements (v0.1.61)
 
-#### Multimodal Tools
-- **Rewritten `read_media` Tool** ([#978](https://github.com/massgen/MassGen/pull/978)): Clearer schema, better error handling, and improved naming
-- **`MediaCallLedgerHook`**: New hook for tracking `read_media` and `generate_media` tool calls
+**🎉 Released: March 9, 2026**
 
-#### Subagent Enhancements
-- **`inherit_spawning_agent_backend`** ([#978](https://github.com/massgen/MassGen/pull/978)): Subagents automatically inherit the spawning agent's backend configuration
-- **`final_answer_strategy`**: Configurable child orchestrator final-answer policy (winner_reuse, winner_present, synthesize)
-- **Per-Agent `subagent_agents`**: Per-agent override for subagent agent configs; orchestrator config file support with robust JSON parsing
+#### Round Evaluator Paradigm
+- **Round Evaluator Subagent Type** ([#986](https://github.com/massgen/MassGen/pull/986)): New `round_evaluator` subagent type that delegates evaluation to specialized evaluator subagents for deeper quality assessment
+- **Orchestrator Refactoring**: Major orchestrator refactoring (+1,189 lines) to support the round evaluation workflow
+- **New Config**: `round_evaluator_example.yaml` for easy adoption
 
-#### Model & Coordination
-- **GPT-5.4 Support** ([#978](https://github.com/massgen/MassGen/pull/978)): New default OpenAI flagship model added to the model registry
-- **Decomp + Checklist Cooperation**: Decomposition mode works with the checklist workflow for quality-gated subtask iteration
-- **Improved Verification Round Time**: Better `verification_latest` prompts for faster verification rounds
+#### Evaluation Improvements
+- **Improved Evaluation Prompts** ([#986](https://github.com/massgen/MassGen/pull/986)): Clearer, more actionable feedback with task plan injection
+- **Simplified Config**: Simplified config handling for evaluation parameters
+- **SUBAGENT.md Generality**: Improved SUBAGENT.md for broader subagent compatibility
 
 #### Fixes
-- **Checklist & Proposal Injections**: More reliable checklist behavior with improved proposal injection
-- **Codex Prompt Caching**: Fixed prompt caching calculation for pricing accuracy
-- **Task Plan Refresh**: Fixed task plan refresh during quality rounds
-- **Skill Prefix Handling**: Fixed edge cases in skill prefix resolution
+- **Session Resumption** ([#986](https://github.com/massgen/MassGen/pull/986)): Fixed resumption from already-resumed logs
+- **Round Evaluation Prompts**: Improved round evaluation prompt clarity
+
+### Previous Achievements (v0.0.3 - v0.1.60)
 
-### Previous Achievements (v0.0.3 - v0.1.59)
+✅ **Multimodal Tools, Subagent Enhancements & GPT-5.4 (v0.1.60)**: Rewritten read_media with clearer schema and MediaCallLedgerHook. Subagent enhancements with inherit_spawning_agent_backend, final_answer_strategy, per-agent subagent_agents. GPT-5.4 as default OpenAI flagship. Decomp mode cooperates with checklist workflow. Codex prompt caching fix.
 
 ✅ **Quality Round Improvements (v0.1.59)**: Auto-add improvements to task plan, plan review enhancements. Better eval gen config, checklist fixes, Gemini tool name normalization for MCP. Subagent behavior adjustments, Docker skill write access fixes. Video gen skill adjustments and impact metric restoration.
 
@@ -1521,9 +1516,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 We welcome community contributions to achieve these goals.
 
-### v0.1.60 Roadmap
+### v0.1.62 Roadmap
 
-Version 0.1.60 focuses on improving skill use and exploration:
+Version 0.1.62 focuses on improving skill use and exploration:
 
 #### Planned Features
 - **Improve Skill Use and Exploration** ([#873](https://github.com/massgen/MassGen/issues/873)): Local skill execution, skill registry with hierarchical organization, and skill consolidation workflow