Skip to content

Latest commit

 

History

History
520 lines (375 loc) · 14.5 KB

File metadata and controls

520 lines (375 loc) · 14.5 KB

SpecBeads Psychological Prompting Enhancement Plan

Overview

Integrate research-backed psychological prompting techniques into SpecBeads to improve agent performance by 45%+ on complex tasks.

Research Foundation

Based on:

  • EmotionPrompt (Li et al., 2023): +115% on hard tasks
  • OPRO "Deep Breath" (Google DeepMind, 2023): 34% → 80% accuracy
  • Incentive Prompting (Bsharat et al., 2023): +45% quality
  • ExpertPrompting (Xu et al., 2023): 24% → 84% accuracy

Phase 1: Core Framework

1.1 Create Persona Templates

File: .specify/prompts/personas.md

## Expert Software Architect Persona

You are a senior software architect with 15+ years implementing spec-driven systems.

**Your Expertise**:
- Spec Kit workflows and constitutional compliance
- Beads issue tracking and dependency management
- Test-Driven Development (TDD) and SOLID principles
- Bidirectional synchronization patterns
- JavaScript/Node.js/Go implementation

**Your Approach**:
- Always read spec.md and plan.md FIRST for context
- Write tests BEFORE implementation (TDD)
- Keep functions under 50 lines
- Follow constitutional principles strictly
- Validate inputs and provide detailed error messages
- Think step-by-step through complex problems

**Your Standards**:
- 80%+ test coverage on new code (critical paths + edge cases)
- Commits: 300-600 lines (atomic changes)
- Documentation: Clear, concise, actionable
- Edge cases: Always consider and test

1.2 Create Stakes/Challenge Templates

File: .specify/prompts/framing.md

## Critical Operation Framing

**CRITICAL**: This operation affects the entire feature workflow.
Errors could create orphaned issues, broken dependencies, or data loss.

**CHALLENGE**: Create outputs so detailed and accurate that an agent
reading them 6 months later can understand and execute perfectly
without consulting external documentation.

## Success Criteria Framing

**Stakes**: If this fails, it will:
- Block ${dependentTaskCount} downstream tasks
- Require manual reconciliation of task/issue mappings
- Potentially corrupt the dependency graph

**Reward**: If this succeeds perfectly:
- All downstream work proceeds smoothly
- Team saves 20+ hours of debugging
- Constitutional compliance is maintained

1.3 Create Self-Evaluation Template

File: .specify/prompts/self-eval.md

## Self-Evaluation Checkpoint

After completing your work, rate your confidence (0.0 to 1.0):

**Ratings**:
- Task completion: [0.0-1.0]
- Test coverage: [0.0-1.0]
- Constitutional compliance: [0.0-1.0]
- Code quality: [0.0-1.0]
- Documentation: [0.0-1.0]

**Overall confidence**: [average of above]

If ANY rating is below 0.9, explain what's missing and refine your work.

Phase 2: Enhance taskstobeads Command

2.1 Add Opening Prompt Enhancement

Location: After line 35 in speckit.taskstobeads.md

Addition:

## Agent Role & Stakes

**Your Role**: Expert in spec-driven development and issue tracking systems.

**CRITICAL OPERATION**: This sync maintains bidirectional integrity between
tasks.md (planning) and Beads (execution). Errors could orphan issues or
break the dependency graph.

**CHALLENGE**: Create Beads issues so rich with context that an agent
reading them 6 months later can implement the task perfectly without
reading spec.md or plan.md.

**Take a deep breath and execute this synchronization systematically.**

2.2 Enhance Beads Issue Description Builder

Location: Lines 220-252 in current version

Enhancement: Add to buildEnhancedDescription():

function buildEnhancedDescription(task, spec, plan, userStories) {
  // Find relevant context
  relevantSpec = findRelevantContext(spec, specSections, task.story, task.phase)
  relevantPlan = findRelevantContext(plan, planSections, task.story, task.phase)
  userStoryText = userStories[task.story] || ""

  // NEW: Calculate stakes
  dependentCount = calculateDependentTasks(task.id)
  blockingCount = calculateBlockingTasks(task.id)

  return `
**CRITICAL**: This task ${dependentCount > 0 ? `blocks ${dependentCount} downstream task(s)` : 'is in the critical path'}

**Challenge**: Implement with 80%+ test coverage and <300 lines of code while
maintaining SOLID principles.

**Description**:
${task.description}

**Feature Context**:
${relevantSpec}

${task.story ? `**User Story (${task.story})**:\n${userStoryText}\n` : ''}

${relevantPlan ? `**Technical Approach**:\n${relevantPlan}\n` : ''}

${task.filePaths.length > 0 ? `**Files to Create/Modify**:\n${task.filePaths.map(f => `- ${f}`).join('\n')}\n` : ''}

${task.acceptanceCriteria.length > 0 ? `**Acceptance Criteria** (must all pass):\n${task.acceptanceCriteria.map(c => `- [ ] ${c}`).join('\n')}\n` : ''}

**Dependencies**:
${blockingCount > 0 ? `- Blocked by: ${blockingCount} task(s) (must complete first)` : '- No blockers'}
${dependentCount > 0 ? `- Blocks: ${dependentCount} task(s) (critical path)` : '- No dependents'}

**Metadata**:
- **Task ID**: ${task.id}
- **Phase**: ${task.phase}
- **Priority**: ${determinePriority(task.phase, task.story)} (1=highest, 4=lowest)
- **Parallel**: ${task.parallel ? 'Yes - can work alongside other [P] tasks' : 'No'}
- **Source**: tasks.md from feature ${featureName}

---
*Take a deep breath and approach this step-by-step.*
*Created by /speckit.taskstobeads*
`.trim()
}

2.3 Add Validation Stakes

Location: Before line 137 (duplicate detection)

Addition:

**CRITICAL VALIDATION**: Detecting duplicates and TDD violations.

This validation prevents:
- Duplicate Beads issues (data corruption)
- Implementation before tests (constitutional violation)
- Broken dependency chains

Take a deep breath and verify each task systematically.

Phase 3: Enhance implementwithbeads Command

3.1 Add Opening Persona

Location: After line 21 in speckit.implementwithbeads.md

Addition:

## Your Role & Expertise

You are a **senior software engineer** implementing SpecBeads features with:

**15+ years experience in**:
- Spec-driven development and TDD workflows
- Beads issue tracking and dependency management
- Constitutional compliance enforcement
- Go/JavaScript implementation with SOLID principles

**Your Standards**:
- Tests FIRST, then implementation (TDD)
- Functions < 50 lines, commits 300-600 lines
- 80%+ test coverage on new code (critical paths + edge cases)
- Clear error messages with line numbers
- Constitution compliance is MANDATORY

**Your Approach**:
- Read all context documents BEFORE coding
- Think step-by-step through complex problems
- Validate assumptions, test edge cases
- Document as you go

3.2 Enhance Implementation Section

Location: Before line 393 (Implementation with Constitutional Compliance)

Addition:

## Stakes & Challenge

**CRITICAL**: This implementation must pass constitutional compliance or the
entire feature fails and blocks ${dependentCount} downstream tasks.

**CHALLENGE**: I bet you can't implement this with:
- 80%+ test coverage (critical paths + edge cases)
- < 300 lines of code
- Zero constitutional violations
- First-time test pass rate

**REWARD**: Perfect implementation saves the team 20+ hours of debugging and
demonstrates mastery of TDD and SOLID principles.

**Take a deep breath. Work through this step-by-step.**

3.3 Add Self-Evaluation Checkpoint

Location: After line 546 (before Post-Implementation)

Addition:

## Implementation Self-Evaluation

Before proceeding, rate your confidence (0.0 to 1.0):

**Ratings**:
- Test coverage (all edge cases tested): ____ / 1.0
- TDD compliance (tests written FIRST): ____ / 1.0
- SOLID principles followed: ____ / 1.0
- Code quality (< 50 lines/function): ____ / 1.0
- Constitutional compliance: ____ / 1.0

**Overall confidence**: [calculate average]

⚠️ **MANDATORY**: If ANY rating < 0.9, you MUST refine your implementation
before marking as complete.

**If confidence < 0.9, what needs improvement?**
[Agent must explain what's missing]

**Action**: [fix/proceed]

3.4 Enhance Auto-Continue Prompt

Location: Line 874 (auto-continue decision)

Enhancement:

**Progress Check**: You've completed ${closedTasks}/${totalTasks} tasks (${progressPercent}%).

**Next Challenge**: Task ${nextTask.taskId} - Can you maintain the same
quality standards while increasing velocity?

**Stakes**: Continuing the momentum saves context-switching time and
maintains flow state. But only if quality remains high.

**Take a deep breath. Ready to continue with excellence?**

Phase 4: Prompt Template Management

4.1 Implementation Approach: Inlined Prompts

Current Implementation: Prompts are inlined directly into slash command files.

Why this approach?

  • ✅ Simpler: No dynamic loading required
  • ✅ Faster: No file I/O during command execution
  • ✅ Easier to debug: All prompt text visible in one file
  • ✅ Version controlled: Changes tracked with command changes

Template files (.specify/prompts/) serve as:

  • Reference documentation
  • Source templates for copying into commands
  • Customization starting points for projects

4.2 Future Enhancement: Dynamic Loader (Optional)

If you prefer dynamic loading over inlined prompts, you could create:

File: .specify/scripts/bash/load-prompt-enhancements.sh

#!/usr/bin/env bash
# OPTIONAL: Dynamic prompt loading (not currently used)
# Current implementation uses inlined prompts instead

REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
PROMPTS_DIR="$REPO_ROOT/.specify/prompts"

# Function to get persona prompt
get_persona_prompt() {
  local persona_file="$PROMPTS_DIR/personas.md"
  if [[ -f "$persona_file" ]]; then
    cat "$persona_file"
  fi
}

# Function to get stakes framing
get_stakes_prompt() {
  local task_count="${1:-0}"
  local framing_file="$PROMPTS_DIR/framing.md"

  if [[ -f "$framing_file" ]]; then
    sed "s/\${dependentTaskCount}/$task_count/g" "$framing_file"
  fi
}

# Export functions
export -f get_persona_prompt
export -f get_stakes_prompt

Trade-offs:

  • ➕ Easier to update prompts without editing command files
  • ➖ Adds complexity and potential points of failure
  • ➖ Harder to see full prompt context when reading commands

Recommendation: Stick with inlined prompts unless you frequently update prompt templates across multiple projects.

Phase 5: Testing & Validation

5.1 A/B Testing Setup

Create: tests/prompting-comparison.md

# Prompting Enhancement A/B Test

## Test Methodology

1. Select 10 representative tasks from tasks.md
2. Run each with:
   - **Control**: Current prompts (no enhancements)
   - **Treatment**: Enhanced prompts (psychological techniques)
3. Measure:
   - Test coverage achieved
   - Constitutional compliance rate
   - Code quality score
   - First-pass success rate
   - Time to completion

## Metrics

| Metric | Control | Treatment | Improvement |
|--------|---------|-----------|-------------|
| Test coverage | _% | _% | _% |
| Compliance rate | _% | _% | _% |
| Code quality | _/10 | _/10 | _% |
| First-pass success | _% | _% | _% |
| Avg time (min) | _ | _ | _% |

## Success Criteria

- Test coverage: +15% or higher
- Compliance rate: +20% or higher
- Code quality: +1 point or higher
- First-pass success: +25% or higher

5.2 Quality Scoring Script

File: .specify/scripts/bash/score-implementation-quality.sh

#!/usr/bin/env bash

# Score implementation quality
# Usage: ./score-implementation-quality.sh <task-id>

TASK_ID="$1"
SCORE=0
MAX_SCORE=100

# Check test coverage
if grep -q "test" "$MODIFIED_FILES"; then
  SCORE=$((SCORE + 25))
fi

# Check file sizes (<500 lines)
for file in $MODIFIED_FILES; do
  lines=$(wc -l < "$file")
  if [[ $lines -lt 500 ]]; then
    SCORE=$((SCORE + 15))
  fi
done

# Check commit size (300-600 lines)
total_changes=$(git diff --stat | tail -1 | awk '{print $4}')
if [[ $total_changes -ge 300 && $total_changes -le 600 ]]; then
  SCORE=$((SCORE + 20))
fi

# Check constitutional compliance
if .specify/scripts/bash/check-constitutional-compliance.sh; then
  SCORE=$((SCORE + 40))
fi

echo "Quality Score: $SCORE/$MAX_SCORE"

Phase 6: Documentation

6.1 Update README.md

Add section:

## Psychological Prompting Enhancements

SpecBeads uses research-backed prompting techniques to improve agent performance:

- **Detailed Personas**: Agents operate as senior engineers with specific expertise
- **Stakes Framing**: Critical operations are clearly marked with consequences
- **Challenge Prompts**: Competitive framing encourages higher-quality output
- **Self-Evaluation**: Agents validate their own work before proceeding
- **Step-by-Step**: "Take a deep breath" triggers deliberate reasoning

**Research Sources**:
- EmotionPrompt (Li et al., 2023): +115% on complex tasks
- OPRO (Google DeepMind, 2023): 34% → 80% accuracy
- Incentive Prompting (Bsharat et al., 2023): +45% quality

See `PROMPTING_ENHANCEMENT_PLAN.md` for details.

Implementation Timeline

Week 1: Foundation

  • Create .specify/prompts/ directory
  • Write persona templates
  • Write stakes/challenge templates
  • Write self-evaluation templates
  • Create helper scripts

Week 2: taskstobeads Enhancement

  • Add opening persona/stakes
  • Enhance Beads description builder
  • Add validation framing
  • Test with 5 sample tasks

Week 3: implementwithbeads Enhancement

  • Add opening persona
  • Enhance implementation section
  • Add self-evaluation checkpoints
  • Enhance auto-continue prompts
  • Test with 5 sample tasks

Week 4: Testing & Refinement

  • Run A/B comparison tests
  • Measure quality improvements
  • Refine prompts based on results
  • Update documentation

Success Metrics

Target Improvements:

  • Test coverage: +20%
  • Constitutional compliance: +30%
  • First-pass success rate: +40%
  • Code quality score: +25%
  • Agent confidence ratings: Avg >0.9

Red Flags (rollback triggers):

  • Quality score decrease >5%
  • Increased token usage >50%
  • Agent confusion/errors increase
  • Slower execution time >25%

Rollback Plan

If enhancements cause issues:

  1. Immediate: Comment out enhanced sections
  2. Restore: Original prompts from git history
  3. Analyze: Review failure logs
  4. Refine: Adjust prompts based on failures
  5. Retest: Gradual re-introduction

Notes

  • Start with taskstobeads (simpler, fewer touch points)
  • Test incrementally (one technique at a time)
  • Measure everything (before/after comparisons)
  • Be prepared to rollback quickly
  • Document what works and what doesn't

Next Steps: Begin with Phase 1 (Core Framework) creation.