Integrate research-backed psychological prompting techniques into SpecBeads to improve agent performance by 45%+ on complex tasks.
Based on:
- EmotionPrompt (Li et al., 2023): +115% on hard tasks
- OPRO "Deep Breath" (Google DeepMind, 2023): 34% → 80% accuracy
- Incentive Prompting (Bsharat et al., 2023): +45% quality
- ExpertPrompting (Xu et al., 2023): 24% → 84% accuracy
File: .specify/prompts/personas.md
## Expert Software Architect Persona
You are a senior software architect with 15+ years implementing spec-driven systems.
**Your Expertise**:
- Spec Kit workflows and constitutional compliance
- Beads issue tracking and dependency management
- Test-Driven Development (TDD) and SOLID principles
- Bidirectional synchronization patterns
- JavaScript/Node.js/Go implementation
**Your Approach**:
- Always read spec.md and plan.md FIRST for context
- Write tests BEFORE implementation (TDD)
- Keep functions under 50 lines
- Follow constitutional principles strictly
- Validate inputs and provide detailed error messages
- Think step-by-step through complex problems
**Your Standards**:
- 80%+ test coverage on new code (critical paths + edge cases)
- Commits: 300-600 lines (atomic changes)
- Documentation: Clear, concise, actionable
- Edge cases: Always consider and testFile: .specify/prompts/framing.md
## Critical Operation Framing
**CRITICAL**: This operation affects the entire feature workflow.
Errors could create orphaned issues, broken dependencies, or data loss.
**CHALLENGE**: Create outputs so detailed and accurate that an agent
reading them 6 months later can understand and execute perfectly
without consulting external documentation.
## Success Criteria Framing
**Stakes**: If this fails, it will:
- Block ${dependentTaskCount} downstream tasks
- Require manual reconciliation of task/issue mappings
- Potentially corrupt the dependency graph
**Reward**: If this succeeds perfectly:
- All downstream work proceeds smoothly
- Team saves 20+ hours of debugging
- Constitutional compliance is maintainedFile: .specify/prompts/self-eval.md
## Self-Evaluation Checkpoint
After completing your work, rate your confidence (0.0 to 1.0):
**Ratings**:
- Task completion: [0.0-1.0]
- Test coverage: [0.0-1.0]
- Constitutional compliance: [0.0-1.0]
- Code quality: [0.0-1.0]
- Documentation: [0.0-1.0]
**Overall confidence**: [average of above]
If ANY rating is below 0.9, explain what's missing and refine your work.Location: After line 35 in speckit.taskstobeads.md
Addition:
## Agent Role & Stakes
**Your Role**: Expert in spec-driven development and issue tracking systems.
**CRITICAL OPERATION**: This sync maintains bidirectional integrity between
tasks.md (planning) and Beads (execution). Errors could orphan issues or
break the dependency graph.
**CHALLENGE**: Create Beads issues so rich with context that an agent
reading them 6 months later can implement the task perfectly without
reading spec.md or plan.md.
**Take a deep breath and execute this synchronization systematically.**Location: Lines 220-252 in current version
Enhancement: Add to buildEnhancedDescription():
function buildEnhancedDescription(task, spec, plan, userStories) {
// Find relevant context
relevantSpec = findRelevantContext(spec, specSections, task.story, task.phase)
relevantPlan = findRelevantContext(plan, planSections, task.story, task.phase)
userStoryText = userStories[task.story] || ""
// NEW: Calculate stakes
dependentCount = calculateDependentTasks(task.id)
blockingCount = calculateBlockingTasks(task.id)
return `
**CRITICAL**: This task ${dependentCount > 0 ? `blocks ${dependentCount} downstream task(s)` : 'is in the critical path'}
**Challenge**: Implement with 80%+ test coverage and <300 lines of code while
maintaining SOLID principles.
**Description**:
${task.description}
**Feature Context**:
${relevantSpec}
${task.story ? `**User Story (${task.story})**:\n${userStoryText}\n` : ''}
${relevantPlan ? `**Technical Approach**:\n${relevantPlan}\n` : ''}
${task.filePaths.length > 0 ? `**Files to Create/Modify**:\n${task.filePaths.map(f => `- ${f}`).join('\n')}\n` : ''}
${task.acceptanceCriteria.length > 0 ? `**Acceptance Criteria** (must all pass):\n${task.acceptanceCriteria.map(c => `- [ ] ${c}`).join('\n')}\n` : ''}
**Dependencies**:
${blockingCount > 0 ? `- Blocked by: ${blockingCount} task(s) (must complete first)` : '- No blockers'}
${dependentCount > 0 ? `- Blocks: ${dependentCount} task(s) (critical path)` : '- No dependents'}
**Metadata**:
- **Task ID**: ${task.id}
- **Phase**: ${task.phase}
- **Priority**: ${determinePriority(task.phase, task.story)} (1=highest, 4=lowest)
- **Parallel**: ${task.parallel ? 'Yes - can work alongside other [P] tasks' : 'No'}
- **Source**: tasks.md from feature ${featureName}
---
*Take a deep breath and approach this step-by-step.*
*Created by /speckit.taskstobeads*
`.trim()
}Location: Before line 137 (duplicate detection)
Addition:
**CRITICAL VALIDATION**: Detecting duplicates and TDD violations.
This validation prevents:
- Duplicate Beads issues (data corruption)
- Implementation before tests (constitutional violation)
- Broken dependency chains
Take a deep breath and verify each task systematically.Location: After line 21 in speckit.implementwithbeads.md
Addition:
## Your Role & Expertise
You are a **senior software engineer** implementing SpecBeads features with:
**15+ years experience in**:
- Spec-driven development and TDD workflows
- Beads issue tracking and dependency management
- Constitutional compliance enforcement
- Go/JavaScript implementation with SOLID principles
**Your Standards**:
- Tests FIRST, then implementation (TDD)
- Functions < 50 lines, commits 300-600 lines
- 80%+ test coverage on new code (critical paths + edge cases)
- Clear error messages with line numbers
- Constitution compliance is MANDATORY
**Your Approach**:
- Read all context documents BEFORE coding
- Think step-by-step through complex problems
- Validate assumptions, test edge cases
- Document as you goLocation: Before line 393 (Implementation with Constitutional Compliance)
Addition:
## Stakes & Challenge
**CRITICAL**: This implementation must pass constitutional compliance or the
entire feature fails and blocks ${dependentCount} downstream tasks.
**CHALLENGE**: I bet you can't implement this with:
- 80%+ test coverage (critical paths + edge cases)
- < 300 lines of code
- Zero constitutional violations
- First-time test pass rate
**REWARD**: Perfect implementation saves the team 20+ hours of debugging and
demonstrates mastery of TDD and SOLID principles.
**Take a deep breath. Work through this step-by-step.**Location: After line 546 (before Post-Implementation)
Addition:
## Implementation Self-Evaluation
Before proceeding, rate your confidence (0.0 to 1.0):
**Ratings**:
- Test coverage (all edge cases tested): ____ / 1.0
- TDD compliance (tests written FIRST): ____ / 1.0
- SOLID principles followed: ____ / 1.0
- Code quality (< 50 lines/function): ____ / 1.0
- Constitutional compliance: ____ / 1.0
**Overall confidence**: [calculate average]
⚠️ **MANDATORY**: If ANY rating < 0.9, you MUST refine your implementation
before marking as complete.
**If confidence < 0.9, what needs improvement?**
[Agent must explain what's missing]
**Action**: [fix/proceed]Location: Line 874 (auto-continue decision)
Enhancement:
**Progress Check**: You've completed ${closedTasks}/${totalTasks} tasks (${progressPercent}%).
**Next Challenge**: Task ${nextTask.taskId} - Can you maintain the same
quality standards while increasing velocity?
**Stakes**: Continuing the momentum saves context-switching time and
maintains flow state. But only if quality remains high.
**Take a deep breath. Ready to continue with excellence?**Current Implementation: Prompts are inlined directly into slash command files.
Why this approach?
- ✅ Simpler: No dynamic loading required
- ✅ Faster: No file I/O during command execution
- ✅ Easier to debug: All prompt text visible in one file
- ✅ Version controlled: Changes tracked with command changes
Template files (.specify/prompts/) serve as:
- Reference documentation
- Source templates for copying into commands
- Customization starting points for projects
If you prefer dynamic loading over inlined prompts, you could create:
File: .specify/scripts/bash/load-prompt-enhancements.sh
#!/usr/bin/env bash
# OPTIONAL: Dynamic prompt loading (not currently used)
# Current implementation uses inlined prompts instead
REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
PROMPTS_DIR="$REPO_ROOT/.specify/prompts"
# Function to get persona prompt
get_persona_prompt() {
local persona_file="$PROMPTS_DIR/personas.md"
if [[ -f "$persona_file" ]]; then
cat "$persona_file"
fi
}
# Function to get stakes framing
get_stakes_prompt() {
local task_count="${1:-0}"
local framing_file="$PROMPTS_DIR/framing.md"
if [[ -f "$framing_file" ]]; then
sed "s/\${dependentTaskCount}/$task_count/g" "$framing_file"
fi
}
# Export functions
export -f get_persona_prompt
export -f get_stakes_promptTrade-offs:
- ➕ Easier to update prompts without editing command files
- ➖ Adds complexity and potential points of failure
- ➖ Harder to see full prompt context when reading commands
Recommendation: Stick with inlined prompts unless you frequently update prompt templates across multiple projects.
Create: tests/prompting-comparison.md
# Prompting Enhancement A/B Test
## Test Methodology
1. Select 10 representative tasks from tasks.md
2. Run each with:
- **Control**: Current prompts (no enhancements)
- **Treatment**: Enhanced prompts (psychological techniques)
3. Measure:
- Test coverage achieved
- Constitutional compliance rate
- Code quality score
- First-pass success rate
- Time to completion
## Metrics
| Metric | Control | Treatment | Improvement |
|--------|---------|-----------|-------------|
| Test coverage | _% | _% | _% |
| Compliance rate | _% | _% | _% |
| Code quality | _/10 | _/10 | _% |
| First-pass success | _% | _% | _% |
| Avg time (min) | _ | _ | _% |
## Success Criteria
- Test coverage: +15% or higher
- Compliance rate: +20% or higher
- Code quality: +1 point or higher
- First-pass success: +25% or higherFile: .specify/scripts/bash/score-implementation-quality.sh
#!/usr/bin/env bash
# Score implementation quality
# Usage: ./score-implementation-quality.sh <task-id>
TASK_ID="$1"
SCORE=0
MAX_SCORE=100
# Check test coverage
if grep -q "test" "$MODIFIED_FILES"; then
SCORE=$((SCORE + 25))
fi
# Check file sizes (<500 lines)
for file in $MODIFIED_FILES; do
lines=$(wc -l < "$file")
if [[ $lines -lt 500 ]]; then
SCORE=$((SCORE + 15))
fi
done
# Check commit size (300-600 lines)
total_changes=$(git diff --stat | tail -1 | awk '{print $4}')
if [[ $total_changes -ge 300 && $total_changes -le 600 ]]; then
SCORE=$((SCORE + 20))
fi
# Check constitutional compliance
if .specify/scripts/bash/check-constitutional-compliance.sh; then
SCORE=$((SCORE + 40))
fi
echo "Quality Score: $SCORE/$MAX_SCORE"Add section:
## Psychological Prompting Enhancements
SpecBeads uses research-backed prompting techniques to improve agent performance:
- **Detailed Personas**: Agents operate as senior engineers with specific expertise
- **Stakes Framing**: Critical operations are clearly marked with consequences
- **Challenge Prompts**: Competitive framing encourages higher-quality output
- **Self-Evaluation**: Agents validate their own work before proceeding
- **Step-by-Step**: "Take a deep breath" triggers deliberate reasoning
**Research Sources**:
- EmotionPrompt (Li et al., 2023): +115% on complex tasks
- OPRO (Google DeepMind, 2023): 34% → 80% accuracy
- Incentive Prompting (Bsharat et al., 2023): +45% quality
See `PROMPTING_ENHANCEMENT_PLAN.md` for details.- Create
.specify/prompts/directory - Write persona templates
- Write stakes/challenge templates
- Write self-evaluation templates
- Create helper scripts
- Add opening persona/stakes
- Enhance Beads description builder
- Add validation framing
- Test with 5 sample tasks
- Add opening persona
- Enhance implementation section
- Add self-evaluation checkpoints
- Enhance auto-continue prompts
- Test with 5 sample tasks
- Run A/B comparison tests
- Measure quality improvements
- Refine prompts based on results
- Update documentation
Target Improvements:
- Test coverage: +20%
- Constitutional compliance: +30%
- First-pass success rate: +40%
- Code quality score: +25%
- Agent confidence ratings: Avg >0.9
Red Flags (rollback triggers):
- Quality score decrease >5%
- Increased token usage >50%
- Agent confusion/errors increase
- Slower execution time >25%
If enhancements cause issues:
- Immediate: Comment out enhanced sections
- Restore: Original prompts from git history
- Analyze: Review failure logs
- Refine: Adjust prompts based on failures
- Retest: Gradual re-introduction
- Start with taskstobeads (simpler, fewer touch points)
- Test incrementally (one technique at a time)
- Measure everything (before/after comparisons)
- Be prepared to rollback quickly
- Document what works and what doesn't
Next Steps: Begin with Phase 1 (Core Framework) creation.