The AI-CoScientist chatbot has been significantly enhanced with two major improvements:
- Rich Terminal UI: Beautiful colored output, tables, progress bars, and panels
- Conversation History: Save and load sessions for continuity across sessions
- Claude AI Integration: Real AI-based paper evaluation instead of heuristics
- Detailed Feedback: Comprehensive strengths, weaknesses, and justifications
- Higher Accuracy: More reliable and consistent scoring
# Using Poetry (recommended)
cd AI-CoScientist
poetry install
# The following packages are now included:
# - rich: Terminal UI enhancement
# - python-docx: Document processing
# - anthropic: Claude AI integrationMake sure your .env file contains:
ANTHROPIC_API_KEY=your_anthropic_api_key_here# Run the enhanced chatbot with Rich UI and LLM evaluation
python scripts/chat_reviewer_enhanced.pyScore Display:
- Color-coded overall score panel (green for high scores, red for low)
- Organized dimensional scores table
- Model contributions breakdown
- Visual progress indicators during evaluation
Rich Formatting:
- Markdown rendering for bot responses
- Syntax-highlighted code examples
- Bordered panels for different sections
- Emoji indicators for visual clarity
Save Session:
💬 You: save conversation
✅ Session saved! ID: 20241007_143022
Load Previous Session:
💬 You: load conversation
[Displays list of saved sessions]
Enter session ID to load: 20241007_143022
✅ Session 20241007_143022 loaded!
[Displays previous paper scores]
List All Sessions:
💬 You: show history
[Displays table of recent sessions with dates and paper names]
Auto-Save:
- Sessions are automatically saved after each paper evaluation
- Session data includes:
- Conversation messages
- Paper scores
- Enhanced versions
- Timestamps
Storage Location:
- Sessions saved in
~/.ai-coscientist/chat_history/ - JSON format for easy inspection and backup
Real AI Analysis:
💬 You: Review my paper: paper.docx
[Rich progress indicator]
Analyzing paper with LLM-based analysis...
📊 Overall Score: 8.34/10 (Very Good)
Confidence: 0.92
[Color-coded dimensional scores table]
💪 Strengths:
✓ Novel integration of ensemble methods
✓ Comprehensive experimental validation
✓ Clear methodology description
⚠️ Areas for Improvement:
• Limited discussion of computational complexity
• Could benefit from additional real-world case studies
📊 Score Justifications:
Novelty: The ensemble approach is innovative but builds on existing frameworks...
Methodology: Experimental design is rigorous with proper validation...
Clarity: Writing is generally clear but technical sections could be simplified...
Significance: Addresses important problem with practical implications...
📝 Overall Assessment:
This paper presents a solid contribution to the field with strong methodology
and clear presentation. The ensemble approach shows promise for practical
applications.
Fallback to Heuristics:
- If LLM evaluation fails (API issues, quota), automatically falls back to heuristic evaluation
- Clearly indicates which evaluation method was used
- Lower confidence score for heuristic evaluation
| Feature | Original | Enhanced |
|---|---|---|
| UI | Plain text | Rich colored tables, panels, progress bars |
| Evaluation | Heuristics only | Real Claude AI + heuristic fallback |
| Feedback | Basic scores | Detailed strengths, weaknesses, justifications |
| History | None | Save/load sessions with full context |
| Confidence | ~0.65 | ~0.92 with LLM |
| Auto-save | No | Yes, after each evaluation |
| Progress | No indicators | Spinners and progress bars |
$ python scripts/chat_reviewer_enhanced.py
[Beautiful welcome banner displayed]
💬 You: Review my paper: ~/research/breakthrough-paper.docx
[Rich progress: "Analyzing paper with LLM-based analysis..."]
📊 Overall Score: 7.96/10 (Good - Respectable journals)
Confidence: 0.92
[Colored dimensional scores table]
[Strengths and weaknesses panels]
[Detailed justifications]
Session auto-saved: 20241007_151030
🤖 Assistant: Great work! Your paper shows strong methodology (7.89)
and clear writing (7.45). The main areas for improvement are novelty
(7.46) and significance (7.40).
Would you like specific suggestions to reach 8.5+?
💬 You: Yes, get me to 8.5+
[Rich table of improvement suggestions displayed]
💡 Improvement Suggestions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# | Suggestion | Time | Gain | Difficulty
──┼─────────────────────────────────────┼──────────┼────────┼────────────
1 | Transform Title with Crisis Framing | 30 min | +0.30 | Easy
2 | Add Theoretical Justification | 2 hours | +0.30 | Medium
3 | Quantify All Impact Statements | 1-2 hours| +0.20 | Easy
[Continue conversation...]
💬 You: save conversation
✅ Session saved! ID: 20241007_151030
💬 You: quit
👋 Goodbye! Good luck with your paper!$ python scripts/chat_reviewer_enhanced.py
💬 You: load conversation
💾 Saved Sessions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# | Session ID | Date | Paper | Messages
──┼─────────────────┼────────────┼────────────────────┼──────────
1 | 20241007_151030 | 2024-10-07 | breakthrough-paper | 12
2 | 20241006_093015 | 2024-10-06 | pilot-study | 8
3 | 20241005_140022 | 2024-10-05 | review-article | 15
Enter session ID to load: 20241007_151030
✅ Session 20241007_151030 loaded!
[Previous paper scores displayed]
🤖 Assistant: Session restored. You can continue from where you left off.
💬 You: Apply the theoretical justification enhancement
[Continue from previous session context...]# LLM Evaluation (Default)
$ python scripts/paper_evaluator_llm.py paper.docx
📊 Overall Score: 8.34/10 (Very Good - Strong specialty journals)
Confidence: 0.92
LLM Evaluation: Yes
✅ Strengths:
• Novel ensemble approach
• Comprehensive validation
• Clear methodology
⚠️ Areas for Improvement:
• Limited computational complexity analysis
• Could expand real-world applications
# Heuristic Evaluation
$ python scripts/paper_evaluator_llm.py paper.docx --no-llm
📊 Overall Score: 7.80/10 (Good - Respectable journals)
Confidence: 0.65
LLM Evaluation: No (Heuristic)
[Basic scores only, no detailed feedback]Session Data Structure:
{
"timestamp": "2024-10-07T15:10:30",
"paper_path": "/path/to/paper.docx",
"scores": {
"overall": 7.96,
"novelty": 7.46,
...
},
"messages": [
{"role": "user", "content": "Review my paper..."},
{"role": "assistant", "content": "..."}
],
"enhanced_versions": []
}Session Operations:
save conversation: Manual saveload conversation: Restore previous sessionshow history: List all saved sessions- Auto-save: Triggered after each evaluation
LLM Mode (Default):
- Uses Claude 3.5 Sonnet for analysis
- Provides detailed justifications
- Higher confidence (0.92)
- API costs apply (~$0.01-0.05 per evaluation)
Heuristic Mode (Fallback):
- Structure-based scoring
- Word count analysis
- No API costs
- Lower confidence (0.65)
- Fast evaluation
Switching Modes:
# In code
scores = evaluate_paper_file("paper.docx", use_llm=True) # LLM mode
scores = evaluate_paper_file("paper.docx", use_llm=False) # Heuristic modeAvailable Components:
Panel: Bordered sections with titlesTable: Organized data display with headersProgress: Spinners and progress barsMarkdown: Formatted text renderingPrompt: Enhanced user input
Color Scheme:
- Green: High scores, strengths, success
- Yellow: Medium scores, warnings
- Red: Low scores, errors
- Cyan: Titles, labels
- Magenta: Highlights
- Blue: Information
Edit scripts/chat_reviewer_enhanced.py:
# Evaluation settings
self.use_llm_by_default = True # Change to False for heuristic default
# History settings
history_dir = Path.home() / ".ai-coscientist" / "chat_history"
# Claude model
model="claude-sonnet-4-5-20250929" # Latest Sonnet 4.5
temperature=0.3 # Lower = more consistent
max_tokens=2048Edit scripts/paper_evaluator_llm.py:
# Text truncation (to manage costs)
max_chars = 50000 # Increase for longer papers
# Model settings
model="claude-sonnet-4-5-20250929" # Latest Sonnet 4.5
temperature=0.3 # Adjust for more/less variability
max_tokens=2048
# Dimensional weights
overall = (
methodology * 0.35 + # Adjust weights
novelty * 0.25 +
clarity * 0.20 +
significance * 0.20
)Per Evaluation:
- Input tokens: ~5,000-15,000 (paper content)
- Output tokens: ~1,000-2,000 (evaluation)
- Cost: ~$0.01-0.05 per paper
Cost Optimization:
- Heuristic mode: Free, instant
- Session history: Reuse previous evaluations
- Text truncation: Limit to 50,000 chars
❌ Error: ANTHROPIC_API_KEY not found in environment variables.
Solution:
# Add to .env file
echo "ANTHROPIC_API_KEY=your_key_here" >> .env
# Or export directly
export ANTHROPIC_API_KEY=your_key_here# Rich library not found
poetry add rich
# python-docx not found
poetry add python-docx
# Anthropic not found
poetry add anthropic❌ Session 20241007_151030 not found.
Check:
# List session files
ls ~/.ai-coscientist/chat_history/
# Verify file exists
cat ~/.ai-coscientist/chat_history/session_20241007_151030.jsonAutomatic Fallback:
- System automatically falls back to heuristic evaluation
- Warning displayed in yellow
- Lower confidence score indicated
Manual Fallback:
# Use heuristic mode directly
python scripts/paper_evaluator_llm.py paper.docx --no-llm| Mode | Speed | Accuracy | Cost |
|---|---|---|---|
| LLM | 10-30s | High (0.92) | $0.01-0.05 |
| Heuristic | <1s | Medium (0.65) | Free |
- Chat session: ~5-10 MB
- Saved sessions: ~10-50 KB each
- History directory: ~1-5 MB for 100 sessions
- Use LLM mode for important papers
- Provide complete papers (abstract, methods, results, discussion)
- Save sessions before major changes
- Review justifications to understand scores
- Use heuristic mode for quick checks
- Reuse sessions instead of re-evaluating
- Truncate long papers if needed
- Batch evaluations instead of repeated single evals
- Start with "show history" to see previous work
- Save after important evaluations
- Use conversation context - bot remembers your goals
- Ask follow-up questions - leverage Claude's understanding
The following features are planned but not yet implemented:
-
Multi-Paper Comparison
- Compare multiple papers side-by-side
- Ranking and recommendation system
- Batch evaluation mode
-
Voice Input Support
- Speech-to-text integration
- Voice mode toggle
- Hands-free operation
# Old command
python scripts/chat_reviewer.py
# New command (enhanced version)
python scripts/chat_reviewer_enhanced.pyWhat's New:
- All original features retained
- Rich UI automatically enabled
- LLM evaluation automatically used
- Session auto-save enabled
- No configuration changes needed
Backward Compatibility:
- Original
chat_reviewer.pystill works paper_evaluator.pystill available for heuristic-only- No breaking changes to commands
Before (Heuristic):
from paper_evaluator import evaluate_paper_file
scores = evaluate_paper_file("paper.docx")
# Heuristic scores onlyAfter (LLM):
from paper_evaluator_llm import evaluate_paper_file
scores = evaluate_paper_file("paper.docx", use_llm=True)
# LLM evaluation with detailed feedbackFor issues or questions:
- GitHub Issues: https://github.com/Transconnectome/AI-CoScientist/issues
- Documentation: See README.md and PAPER_ENHANCEMENT_GUIDE.md
Built on the AI-CoScientist paper enhancement system. Uses Claude AI for natural language understanding and evaluation.