Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Recent Releases

**v0.1.72 (April 3, 2026)** - Grok Backend Update & Circuit Breaker Phase 2
Grok backend update with latest improvements. LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only). Config plumbing smoke tests for all backends.

**v0.1.71 (April 1, 2026)** - Trace Memory & Evaluation Polish
Trace analyzer subagents now launch in the background after each round to write insights from execution traces into memory. Improved evaluation criteria generation and system prompt tuning. Fixes for final injection, eval criteria GPT pre-collab, trace analyzer launch, and trace memory.

**v0.1.70 (March 30, 2026)** - Evaluation Criteria Redesign
Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation with tighter iterative submission cycles. Fast iteration mode, WebUI review modal, and background trace analysis from round 2.

**v0.1.69 (March 27, 2026)** - WebUI Automation & Improved Skill
WebUI automation now auto-starts without browser interaction — open the URL at any point mid-run to monitor progress. MassGen skill redesign for increased usability and WebUI integration. Quickstart Wizard rework, Workspace Browser expansion, and flexible evaluation criteria field names.
---

## [0.1.72] - 2026-04-03

### Changed
- **Grok Backend Update** ([#1044](https://github.com/massgen/MassGen/pull/1044)): Updated Grok backend with latest improvements

### Added
- **Circuit Breaker Phase 2** ([#1038](https://github.com/massgen/MassGen/pull/1038)): LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only in v0.1.68); Gemini also handles 503 errors
- **Config Plumbing Smoke Tests** ([#1038](https://github.com/massgen/MassGen/pull/1038)): Smoke tests verify circuit breaker wiring and API call timing for all backends

### Fixed
- **Response API Timing** ([#1038](https://github.com/massgen/MassGen/pull/1038)): Added start/end API call timing to ResponseBackend non-MCP path

### Technical Details
- **Major Focus**: Circuit Breaker Phase 2 — rate limit protection across all major backends
- **PRs Merged**: [#1038](https://github.com/massgen/MassGen/pull/1038), [#1044](https://github.com/massgen/MassGen/pull/1044)
- **Contributors**: @amabito, @HenryQi, @ncrispino and the MassGen team

---

Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.

## 🔧 Development Workflow

> **Important**: Our next version is v0.1.72. If you want to contribute, please contribute to the `dev/v0.1.72` branch (or `main` if dev/v0.1.72 doesn't exist yet).
> **Important**: Our next version is v0.1.73. If you want to contribute, please contribute to the `dev/v0.1.73` branch (or `main` if dev/v0.1.73 doesn't exist yet).

### 1. Create Feature Branch

Expand All @@ -368,7 +368,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README.
git fetch upstream

# Create feature branch from dev/v0.1.60 (or main if dev branch doesn't exist yet)
git checkout -b feature/your-feature-name upstream/dev/v0.1.72
git checkout -b feature/your-feature-name upstream/dev/v0.1.73
```

### 2. Make Your Changes
Expand Down Expand Up @@ -507,7 +507,7 @@ git push origin feature/your-feature-name
```

Then create a pull request on GitHub:
- Base branch: `dev/v0.1.72` (or `main` if dev branch doesn't exist yet)
- Base branch: `dev/v0.1.73` (or `main` if dev branch doesn't exist yet)
- Compare branch: `feature/your-feature-name`
- Add clear description of changes
- Link any related issues
Expand Down Expand Up @@ -617,7 +617,7 @@ Have a significant feature idea not covered by existing tracks?
- [ ] Tests pass locally
- [ ] Documentation is updated if needed
- [ ] Commit messages follow convention
- [ ] PR targets `dev/v0.1.72` branch (or `main` if dev branch doesn't exist yet)
- [ ] PR targets `dev/v0.1.73` branch (or `main` if dev branch doesn't exist yet)

### PR Description Should Include

Expand Down
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ This project started with the "threads of thought" and "iterative refinement" id
<details open>
<summary><h3>🆕 Latest Features</h3></summary>

- [v0.1.71 Features](#-latest-features-v0171)
- [v0.1.72 Features](#-latest-features-v0172)
</details>

<details open>
Expand Down Expand Up @@ -122,15 +122,15 @@ This project started with the "threads of thought" and "iterative refinement" id
<details open>
<summary><h3>🗺️ Roadmap</h3></summary>

- [Recent Achievements (v0.1.71)](#recent-achievements-v0171)
- [Previous Achievements (v0.0.3 - v0.1.70)](#previous-achievements-v003---v0170)
- [Recent Achievements (v0.1.72)](#recent-achievements-v0172)
- [Previous Achievements (v0.0.3 - v0.1.71)](#previous-achievements-v003---v0171)
- [Key Future Enhancements](#key-future-enhancements)
- Bug Fixes & Backend Improvements
- Advanced Agent Collaboration
- Expanded Model, Tool & Agent Integrations
- Improved Performance & Scalability
- Enhanced Developer Experience
- [v0.1.72 Roadmap](#v0172-roadmap)
- [v0.1.73 Roadmap](#v0173-roadmap)
</details>

<details open>
Expand All @@ -155,20 +155,19 @@ This project started with the "threads of thought" and "iterative refinement" id

---

## 🆕 Latest Features (v0.1.71)
## 🆕 Latest Features (v0.1.72)

**🎉 Released: April 1, 2026**
**🎉 Released: April 3, 2026**

**What's New in v0.1.71:**
- **🔍 Trace Analyzer Subagents** - Launch in the background after each round to write insights from execution traces into memory.
- **📋 Better Evaluation Criteria** - Improved criteria generation for higher-quality, more opinionated output.
- **🧠 System Prompt Tuning** - Adjusted system prompts for better agent performance across coordination rounds.
- **🔧 Stability Fixes** - Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, and memory handling.
**What's New in v0.1.72:**
- **🦎 Grok Backend Update** - Updated Grok backend with latest improvements.
- **⚡ Circuit Breaker Phase 2** - LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only).
- **🧪 Config Plumbing Smoke Tests** - Verify circuit breaker wiring for all backends.

**Try v0.1.71 Features:**
**Try v0.1.72 Features:**
```bash
pip install massgen==0.1.71
uv run massgen --config @examples/features/trace_analyzer_background.yaml "Create an svg of an AI agent coding."
pip install massgen==0.1.72
uv run massgen --config @examples/providers/others/grok_x_search.yaml "Research the latest posts and news about AI agents in the last week, and summarize the key trends and insights."
```

→ [See full release history and examples](massgen/configs/README.md#release-history--examples)
Expand Down Expand Up @@ -1240,17 +1239,18 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch

⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.

### Recent Achievements (v0.1.71)
### Recent Achievements (v0.1.72)

**🎉 Released: April 1, 2026**
**🎉 Released: April 3, 2026**

#### Trace Memory & Evaluation Polish
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
#### Grok Backend Update & Circuit Breaker Phase 2
- **Grok Backend Update** ([#1044](https://github.com/massgen/MassGen/pull/1044)): Updated Grok backend with latest improvements
- **Circuit Breaker Phase 2** ([#1038](https://github.com/massgen/MassGen/pull/1038)): LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only); Gemini also handles 503
- **Config Plumbing Smoke Tests** ([#1038](https://github.com/massgen/MassGen/pull/1038)): Verify circuit breaker wiring for all backends

### Previous Achievements (v0.0.3 - v0.1.70)
### Previous Achievements (v0.0.3 - v0.1.71)

✅ **Trace Memory & Evaluation Polish (v0.1.71)**: Trace analyzer subagents launch in background after each round to write insights from execution traces into memory. Improved evaluation criteria generation and system prompt tuning.

✅ **Evaluation Criteria Redesign (v0.1.70)**: Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation. Fast iteration mode, WebUI review modal, and background trace analysis.

Expand Down Expand Up @@ -1537,9 +1537,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch

We welcome community contributions to achieve these goals.

### v0.1.72 Roadmap
### v0.1.73 Roadmap

Version 0.1.72 focuses on cloud execution:
Version 0.1.73 focuses on cloud execution:

#### Planned Features
- **Cloud Modal MVP** ([#982](https://github.com/massgen/MassGen/issues/982)): Run MassGen as a cloud job on Modal — progress streams to terminal, results saved locally under `.massgen/cloud_jobs/`
Expand Down
48 changes: 24 additions & 24 deletions README_PYPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ This project started with the "threads of thought" and "iterative refinement" id
<details open>
<summary><h3>🆕 Latest Features</h3></summary>

- [v0.1.71 Features](#-latest-features-v0171)
- [v0.1.72 Features](#-latest-features-v0172)
</details>

<details open>
Expand Down Expand Up @@ -121,15 +121,15 @@ This project started with the "threads of thought" and "iterative refinement" id
<details open>
<summary><h3>🗺️ Roadmap</h3></summary>

- [Recent Achievements (v0.1.71)](#recent-achievements-v0171)
- [Previous Achievements (v0.0.3 - v0.1.70)](#previous-achievements-v003---v0170)
- [Recent Achievements (v0.1.72)](#recent-achievements-v0172)
- [Previous Achievements (v0.0.3 - v0.1.71)](#previous-achievements-v003---v0171)
- [Key Future Enhancements](#key-future-enhancements)
- Bug Fixes & Backend Improvements
- Advanced Agent Collaboration
- Expanded Model, Tool & Agent Integrations
- Improved Performance & Scalability
- Enhanced Developer Experience
- [v0.1.72 Roadmap](#v0172-roadmap)
- [v0.1.73 Roadmap](#v0173-roadmap)
</details>

<details open>
Expand All @@ -154,20 +154,19 @@ This project started with the "threads of thought" and "iterative refinement" id

---

## 🆕 Latest Features (v0.1.71)
## 🆕 Latest Features (v0.1.72)

**🎉 Released: April 1, 2026**
**🎉 Released: April 3, 2026**

**What's New in v0.1.71:**
- **🔍 Trace Analyzer Subagents** - Launch in the background after each round to write insights from execution traces into memory.
- **📋 Better Evaluation Criteria** - Improved criteria generation for higher-quality, more opinionated output.
- **🧠 System Prompt Tuning** - Adjusted system prompts for better agent performance across coordination rounds.
- **🔧 Stability Fixes** - Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, and memory handling.
**What's New in v0.1.72:**
- **🦎 Grok Backend Update** - Updated Grok backend with latest improvements.
- **⚡ Circuit Breaker Phase 2** - LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only).
- **🧪 Config Plumbing Smoke Tests** - Verify circuit breaker wiring for all backends.

**Try v0.1.71 Features:**
**Try v0.1.72 Features:**
```bash
pip install massgen==0.1.71
uv run massgen --config @examples/features/trace_analyzer_background.yaml "Create an svg of an AI agent coding."
pip install massgen==0.1.72
uv run massgen --config @examples/providers/others/grok_x_search.yaml "Research the latest posts and news about AI agents in the last week, and summarize the key trends and insights."
```

→ [See full release history and examples](massgen/configs/README.md#release-history--examples)
Expand Down Expand Up @@ -1239,17 +1238,18 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch

⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.

### Recent Achievements (v0.1.71)
### Recent Achievements (v0.1.72)

**🎉 Released: April 1, 2026**
**🎉 Released: April 3, 2026**

#### Trace Memory & Evaluation Polish
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
#### Grok Backend Update & Circuit Breaker Phase 2
- **Grok Backend Update** ([#1044](https://github.com/massgen/MassGen/pull/1044)): Updated Grok backend with latest improvements
- **Circuit Breaker Phase 2** ([#1038](https://github.com/massgen/MassGen/pull/1038)): LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only); Gemini also handles 503
- **Config Plumbing Smoke Tests** ([#1038](https://github.com/massgen/MassGen/pull/1038)): Verify circuit breaker wiring for all backends

### Previous Achievements (v0.0.3 - v0.1.70)
### Previous Achievements (v0.0.3 - v0.1.71)

✅ **Trace Memory & Evaluation Polish (v0.1.71)**: Trace analyzer subagents launch in background after each round to write insights from execution traces into memory. Improved evaluation criteria generation and system prompt tuning.

✅ **Evaluation Criteria Redesign (v0.1.70)**: Redesigned three-tier evaluation criteria with anti-pattern definitions and aspiration statements. Improved checklist-gated evaluation. Fast iteration mode, WebUI review modal, and background trace analysis.

Expand Down Expand Up @@ -1536,9 +1536,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch

We welcome community contributions to achieve these goals.

### v0.1.72 Roadmap
### v0.1.73 Roadmap

Version 0.1.72 focuses on cloud execution:
Version 0.1.73 focuses on cloud execution:

#### Planned Features
- **Cloud Modal MVP** ([#982](https://github.com/massgen/MassGen/issues/982)): Run MassGen as a cloud job on Modal — progress streams to terminal, results saved locally under `.massgen/cloud_jobs/`
Expand Down
40 changes: 13 additions & 27 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# MassGen Roadmap

**Current Version:** v0.1.71
**Current Version:** v0.1.72

**Release Schedule:** Mondays, Wednesdays, Fridays @ 9am PT

**Last Updated:** April 1, 2026
**Last Updated:** April 3, 2026

This roadmap outlines MassGen's development priorities for upcoming releases. Each release focuses on specific capabilities with real-world use cases.

Expand Down Expand Up @@ -42,40 +42,26 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow

| Release | Target | Feature | Owner | Use Case |
|---------|--------|---------|-------|----------|
| **v0.1.72** | 04/04/26 | Cloud Modal MVP | @ncrispino | Run MassGen as a cloud job on Modal ([#982](https://github.com/massgen/MassGen/issues/982)) |
| **v0.1.73** | 04/07/26 | OpenAI Audio API | @ncrispino | Support OpenAI audio API for audio understanding ([#960](https://github.com/massgen/MassGen/issues/960)) |
| **v0.1.74** | 04/09/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) |
| **v0.1.73** | 04/07/26 | Cloud Modal MVP | @ncrispino | Run MassGen as a cloud job on Modal ([#982](https://github.com/massgen/MassGen/issues/982)) |
| **v0.1.74** | 04/09/26 | OpenAI Audio API | @ncrispino | Support OpenAI audio API for audio understanding ([#960](https://github.com/massgen/MassGen/issues/960)) |
| **v0.1.75** | 04/11/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) |

*All releases ship on MWF @ 9am PT when ready*

---

## ✅ v0.1.71 - Trace Memory & Evaluation Polish (Completed)
## ✅ v0.1.72 - Grok Backend Update & Circuit Breaker Phase 2 (Completed)

**Released:** April 1, 2026
**Released:** April 3, 2026 | PRs: [#1038](https://github.com/massgen/MassGen/pull/1038), [#1044](https://github.com/massgen/MassGen/pull/1044)

### Features
- **Trace Analyzer Subagents**: Background trace analysis after each round — writes insights from execution traces into memory for next-round continuity
- **Better Evaluation Criteria**: Improved criteria generation for higher-quality, more opinionated output
- **System Prompt Tuning**: Adjusted system prompts for better agent performance across coordination rounds
- **Stability Fixes**: Fixed final injection, eval criteria GPT pre-collab, trace analyzer launch, trace memory, and auto round memory
- **Grok Backend Update**: Updated Grok backend with latest improvements
- **Circuit Breaker Phase 2**: LLM API circuit breaker extended to ChatCompletions, Response API, and Gemini backends (was Claude-only); Gemini also handles 503
- **Config Plumbing Smoke Tests**: Verify circuit breaker wiring for all backends

---

## ✅ v0.1.70 - Evaluation Criteria Redesign (Completed)

**Released:** March 30, 2026 | PRs: [#1035](https://github.com/massgen/MassGen/pull/1035)

### Features
- **Evaluation Criteria Redesign**: Three-tier categorization (`primary`, `standard`, `stretch`) with anti-pattern definitions and aspiration statements
- **Improved Checklist-Gated Evaluation**: Tighter iterative submission cycles with improved scoring and improvement proposals before final voting
- **Fast Iteration Mode**: Streamlined multi-round submission phases via `fast_iteration.yaml`
- **WebUI Review Modal**: Approve and comment on outputs in the browser when working in git
- **Background Trace Analysis**: Execution trace analyzer starts automatically from round 2

---

## 📋 v0.1.72 - Cloud Modal MVP
## 📋 v0.1.73 - Cloud Modal MVP

### Features

Expand All @@ -91,7 +77,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow

---

## 📋 v0.1.73 - OpenAI Audio API
## 📋 v0.1.74 - OpenAI Audio API

### Features

Expand All @@ -107,7 +93,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow

---

## 📋 v0.1.74 - Image/Video Edit Capabilities
## 📋 v0.1.75 - Image/Video Edit Capabilities

### Features

Expand Down
10 changes: 5 additions & 5 deletions ROADMAP_v0.1.72.md → ROADMAP_v0.1.73.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# MassGen v0.1.72 Roadmap
# MassGen v0.1.73 Roadmap

**Target Release:** April 4, 2026
**Target Release:** April 7, 2026

## Overview

Version 0.1.72 focuses on running MassGen as a cloud job on Modal.
Version 0.1.73 focuses on running MassGen as a cloud job on Modal.

---

Expand All @@ -27,5 +27,5 @@ Version 0.1.72 focuses on running MassGen as a cloud job on Modal.

## Related Tracks

- **v0.1.71**: Trace Memory & Evaluation Polish — better eval criteria, system prompt tuning, stability fixes
- **v0.1.73**: OpenAI Audio API ([#960](https://github.com/massgen/MassGen/issues/960))
- **v0.1.72**: Grok Backend Update & Circuit Breaker Phase 2 — circuit breaker across all backends, Grok improvements ([#1038](https://github.com/massgen/MassGen/pull/1038), [#1044](https://github.com/massgen/MassGen/pull/1044))
- **v0.1.74**: OpenAI Audio API ([#960](https://github.com/massgen/MassGen/issues/960))
Loading
Loading