fixed reflexion agent death spiral #1266

Steve-Dusty · 2025-12-23T01:21:48Z

Problem

ReflexionAgent had the same death spiral issue as IRE agent, causing excessive iterations and API calls:

Fragile score extraction - Failed to parse scores from LLM responses with markdown or varied formatting
Never triggered early termination - Score defaulted to 0.5 when parsing failed, which never exceeded 0.9 threshold
Always ran full iterations - Even simple tasks used all max_loops iterations
Exceeded timeout thresholds - Simple tasks: 61s, Complex tasks: 203s
Wasted API calls - 9-15 LLM calls per task instead of 3

Root Cause

Identical to IRE agent issue:

Early termination logic existed but depended on score extraction
Score parsing used single fragile regex pattern: r"(?:final|overall)\s+score:?\s*(\d+(?:\.\d+)?)"
LLM responses with markdown (**Score**: 8/10) or different formats (Rating: 8/10) failed to parse
When extraction failed → defaulted to 0.5 → never met 0.9 threshold → ran all iterations

Solution

Applied the proven fix pattern from IRE agent with robust score extraction and improved termination logic.

1. Robust Score Extraction

Added _extract_score_robust() method with multiple fallback strategies:

def _extract_score_robust(self, evaluation: str) -> float:
    # Strategy 1: Multiple regex patterns (handles markdown, different formats)
    score_patterns = [
        r"(?:final|overall)\s+score:?\s*(\d+(?:\.\d+)?)",
        r"score:?\s*(\d+(?:\.\d+)?)\s*/\s*10",
        r"(?:rating|grade):?\s*(\d+(?:\.\d+)?)\s*/\s*10",
        r"(?:rating|grade):?\s*(\d+(?:\.\d+)?)",
    ]

    # Strategy 2: Context-aware patterns (X/10, X out of 10)
    # Strategy 3: Sentiment analysis fallback
    # Default: 0.6 (better than old 0.5)

Now handles:
- **Final Score**: 8/10 ✅
- Rating: 8.6/10 ✅
- Grade: 8 out of 10 ✅
- Markdown formatting ✅
- Sentiment-based scoring ✅

2. Configuration Constants

EARLY_TERMINATION_THRESHOLD = 0.8  # Lower than 0.9 for realistic termination
DEFAULT_SCORE = 0.6  # Higher than 0.5 to increase termination chance
SCORE_IMPROVEMENT_THRESHOLD = 0.05  # Minimum improvement to continue

3. Dual Termination Conditions

# Condition 1: Score is high enough
if current_score >= EARLY_TERMINATION_THRESHOLD:  # 0.8 instead of 0.9
    logger.info(f"✓ High score achieved ({current_score:.2f}). Stopping early.")
    break

# Condition 2: Score not improving
if iteration > 0 and (current_score - prev_score) < SCORE_IMPROVEMENT_THRESHOLD:
    logger.info(f"✓ Score improvement minimal. Stopping early.")
    break

4. Progress Logging

============================================================
Processing task 1/1
============================================================
Task: Explain photosynthesis in one sentence...

--- Iteration 1/3 ---
Evaluation complete. Score: 0.80
Iteration 1 complete | Score: 0.80 | Best: 0.80
✓ High score achieved (0.80 >= 0.8). Stopping early.

============================================================
Task complete | Iterations used: 1/3 | Best score: 0.80
============================================================

Changes

Modified Files:
- swarms/agents/flexion_agent.py - Core ReflexionAgent implementation

Key Changes:
- Line 2: Added import re
- Lines 11-14: Added configuration constants
- Lines 306-371: Added _extract_score_robust() method
- Lines 439-443: Updated evaluate() to use robust extraction
- Lines 603-673: Enhanced termination logic and progress logging


<!-- readthedocs-preview swarms start -->
----
📚 Documentation preview 📚: https://swarms--1266.org.readthedocs.build/en/1266/

<!-- readthedocs-preview swarms end -->

fixed reflexion agent death spiral

748037a

github-actions bot added the agents label Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fixed reflexion agent death spiral #1266

fixed reflexion agent death spiral #1266

Uh oh!

Steve-Dusty commented Dec 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fixed reflexion agent death spiral #1266

Are you sure you want to change the base?

fixed reflexion agent death spiral #1266

Uh oh!

Conversation

Steve-Dusty commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

1. Robust Score Extraction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Steve-Dusty commented Dec 23, 2025 •

edited

Loading