Testing Guide

Comprehensive guide to testing Ralph CLI components

Test Organization
Running Tests
UI Testing with agent-browser
Common Test Scenarios
Writing New Tests
Best Practices

Test Organization

Test Folder Structure

All test files MUST be in /tests directory. This is a strict requirement.

tests/
├── *.mjs                              # Integration and E2E tests
│   ├── cli-smoke.mjs                  # CLI smoke tests
│   ├── agent-loops.mjs                # Agent loop behavior
│   ├── agent-ping.mjs                 # Agent health checks
│   ├── integration.mjs                # Main integration suite
│   ├── integration-actions.mjs        # Actions integration
│   ├── integration-checkpoint.mjs     # Checkpoint system
│   ├── integration-doctor.mjs         # Doctor command
│   ├── integration-metrics.mjs        # Metrics collection
│   ├── integration-notify.mjs         # Notification system
│   ├── integration-risk.mjs           # Risk analysis
│   ├── integration-switcher.mjs       # Agent switcher
│   ├── integration-ui-api.mjs         # UI API integration
│   ├── integration-watch.mjs          # File watching
│   ├── e2e-workflow.mjs               # End-to-end workflows
│   ├── real-agents.mjs                # Real agent execution
│   └── lib-python.mjs                 # Python library tests
│
├── test-*.js                          # Unit tests
│   ├── test-analyzer.js               # Code analyzer
│   ├── test-committer.js              # Git committer
│   ├── test-complexity.js             # Complexity analysis
│   ├── test-context-budget.js         # Context budget
│   ├── test-context-directives.js     # Context directives
│   ├── test-context-scorer.js         # Context scoring
│   ├── test-context-selector.js       # Context selection
│   ├── test-context-visualization.js  # Context visualization
│   ├── test-error-handling.js         # Error handling
│   ├── test-executor.js               # Story executor
│   ├── test-executor-us003.js         # Specific user stories
│   ├── test-git-fallback.js           # Git fallback
│   ├── test-merger.js                 # Branch merger
│   ├── test-parallel-index.js         # Parallel execution
│   ├── test-realistic-scenarios.js    # Realistic workflows
│   ├── test-risk-analyzer.js          # Risk analyzer
│   ├── test-token-usage.js            # Token usage tracking
│   └── test-with-anthropic-api.js     # Anthropic API integration
│
├── fixtures/                          # Test fixtures and sample data
├── helpers/                           # Test utility functions
└── mocks/                             # Mock implementations

File Organization Rules

✅ DO:

Place ALL test files in /tests directory
Use .mjs extension for integration and E2E tests
Use test-*.js naming pattern for unit tests
Use subdirectories (fixtures/, helpers/, mocks/) for supporting files
Keep test file names descriptive and consistent

❌ DON'T:

Place test files in /lib, /bin, or any source directory
Mix test files with production code
Use inconsistent naming conventions
Create test files in the project root

Running Tests

Quick Tests (No Agent Required)

# Smoke tests - fast validation
npm test

# Agent health check
npm run test:ping

Integration Tests (Requires Agents)

# All integration tests
npm run test:all

# Specific integration tests
npm run test:checkpoint      # Checkpoint system
npm run test:switcher        # Agent switching
npm run test:risk           # Risk analysis
npm run test:actions        # Actions workflow
npm run test:notify         # Notifications
npm run test:metrics        # Metrics collection
npm run test:doctor         # Doctor diagnostics
npm run test:watch          # File watching
npm run test:ui-api         # UI API

Advanced Tests

# End-to-end workflow
npm run test:e2e

# Real agent execution (requires configured agents)
npm run test:real

# With coverage reporting
npm run test:coverage

# Integration tests with environment flag
RALPH_INTEGRATION=1 npm test

Test Categories

Smoke Tests (*.mjs) - Quick validation, no real agent needed
Integration Tests (integration-*.mjs) - Multiple components, may require mock/real agents
Unit Tests (test-*.js) - Isolated module tests
E2E Tests (e2e-*.mjs) - Full workflow simulations
Real Agent Tests - Execute against actual Claude/Codex/Droid agents (requires API keys)

UI Testing with agent-browser

What is agent-browser?

Vercel's agent-browser - Fast Rust-based CLI for browser automation, optimized for AI agents.

Why agent-browser?

✅ Fast & reliable: Rust CLI with Node.js fallback
✅ AI-optimized: Snapshot + ref workflow (@e1, @e2) for deterministic selection
✅ No buggy MCP: Standalone CLI tool
✅ Persistent sessions: Isolated browser instances with cookies/storage
✅ JSON output: Machine-readable results

Setup

# Install agent-browser
npm install -g agent-browser
agent-browser install  # Downloads Chromium

# Start UI server
cd ui && npm run dev

# Initialize browser session
agent-browser open http://localhost:3000

Essential Commands

Navigation & Inspection

# Open a URL
agent-browser open http://localhost:3000

# Go back/forward
agent-browser back
agent-browser forward

# Reload page
agent-browser reload

# Take snapshot (see all interactive elements)
agent-browser snapshot -i          # Interactive elements only
agent-browser snapshot -c          # Compact format
agent-browser snapshot             # Full snapshot

# Take screenshot
agent-browser screenshot page.png
agent-browser screenshot --full page.png  # Full page scroll

# Get page title
agent-browser eval "document.title"

# Get current URL
agent-browser eval "window.location.href"

Element Interaction

# Click elements
agent-browser click @e1                          # Use @eN ref from snapshot
agent-browser click "button:has-text('Start')"   # CSS selector
agent-browser click "[role=button]"              # Attribute selector

# Type text
agent-browser type @e17 "PRD-67"
agent-browser fill @e17 "PRD-67"  # Same as type

# Press keys
agent-browser press Enter
agent-browser press Escape
agent-browser press "Control+a"

# Select dropdown
agent-browser select @e12 "Codex"  # Select by value/text

Verification

# Get text content
agent-browser get text @e1
agent-browser get text "h1"

# Get attribute value
agent-browser get attribute @e1 "href"
agent-browser get attribute "button" "disabled"

# Check visibility
agent-browser is visible @e1
agent-browser is visible "button:has-text('Start Build')"

# Check if element exists
agent-browser find "button:has-text('Start Build')"

# Get all matching elements
agent-browser find-all ".stream-card"

Advanced

# Run JavaScript
agent-browser eval "document.querySelectorAll('.stream-card').length"
agent-browser eval "localStorage.getItem('theme')"

# Wait for element
agent-browser wait-for "text=Build completed"
agent-browser wait-for "[data-status='running']"

# Check console messages
agent-browser console
agent-browser errors

# Network activity
agent-browser network requests
agent-browser network responses

Common Test Scenarios

1. Test Dashboard Load

# Navigate to dashboard
agent-browser open http://localhost:3000
agent-browser click @e1  # Click "Press Enter"
agent-browser click @e1  # Click "Back to Dashboard"

# Verify elements are visible
agent-browser snapshot -i
agent-browser is visible "button:has-text('Start Build')"
agent-browser is visible "[data-testid='stream-select']"

# Check for errors
agent-browser console
agent-browser errors

2. Test Stream Selection

# Get available streams
agent-browser get text @e18  # Stream listbox

# Select a specific stream
agent-browser click @e18     # Open dropdown
agent-browser type "PRD-67"  # Type to search
agent-browser press Enter    # Select

# Verify selection
agent-browser get text "[data-testid='selected-stream']"

3. Test Build Configuration

# Set iterations
agent-browser fill @e11 "5"  # Iterations spinbutton

# Select agent
agent-browser click @e12     # Open agent dropdown
agent-browser click @e14     # Select "Codex"

# Toggle dry run
agent-browser click @e20     # Dry run checkbox

# Verify form state
agent-browser get attribute @e20 "checked"
agent-browser get text @e12  # Selected agent

4. Test Navigation

# Navigate to Streams page
agent-browser click @e3      # Streams link
agent-browser snapshot -i    # See stream cards

# Navigate to Logs page
agent-browser click @e5      # Logs link
agent-browser snapshot -i    # See log viewer

# Navigate to Documentation
agent-browser click @e4      # Documentation link
agent-browser snapshot -i    # See docs

5. Test Stream Actions

# Navigate to Streams page
agent-browser click @e3

# Take snapshot to find buttons
agent-browser snapshot -i

# Click "Monitor" for first stream
agent-browser click @e13

# Verify modal/page opened
agent-browser snapshot -i

# Close modal (if applicable)
agent-browser press Escape

6. Test Search Functionality

# Go to Streams page
agent-browser click @e3

# Find search input
agent-browser snapshot -i

# Search for specific PRD
agent-browser type @e17 "PRD-67"

# Verify filtered results
agent-browser eval "document.querySelectorAll('.stream-card').length"

7. Test Real-time Updates

# Open dashboard
agent-browser open http://localhost:3000

# Start a build in another terminal:
# ralph build 1 --prd=67

# Watch for status updates
agent-browser wait-for "text=running"
agent-browser wait-for "[data-status='running']"

# Check real-time progress
agent-browser get text "[data-testid='build-status']"

# Take screenshot
agent-browser screenshot build-running.png

8. Test Error Handling

# Try to build without selecting stream
agent-browser click "button:has-text('Start Build')"

# Check for error message
agent-browser wait-for "text=Please select"
agent-browser snapshot -i

# Check console for errors
agent-browser errors

9. Test Logs Page

# Navigate to Logs
agent-browser click @e5

# Wait for logs to load
agent-browser wait-for "[data-testid='log-entries']"

# Get log count
agent-browser eval "document.querySelectorAll('[data-log-entry]').length"

# Filter logs
agent-browser fill "[data-testid='log-filter']" "ERROR"

# Verify filtered results
agent-browser snapshot -i

10. Test Token/Cost Tracking

# Navigate to Tokens page
agent-browser click @e6

# Verify cost data loads
agent-browser wait-for "text=$"  # Wait for cost to appear
agent-browser snapshot -i

# Get total cost
agent-browser get text "[data-testid='total-cost']"

# Check chart renders
agent-browser is visible "canvas"  # Chart.js renders to canvas

Debugging Tips

# Open headed browser for visual debugging
BROWSER_HEADLESS=false agent-browser open http://localhost:3000

# Slow down actions for observation
BROWSER_SLOW_MO=500 agent-browser click @e1

# Keep browser open after script
BROWSER_KEEP_ALIVE=true agent-browser open http://localhost:3000

# Verbose output
DEBUG=* agent-browser open http://localhost:3000

# Save HTML for inspection
agent-browser eval "document.documentElement.outerHTML" > page.html

UI Testing Helper Script

# Quick snapshot of homepage
.agents/ralph/test-ui.sh snapshot

# Test PRD list page (automated)
.agents/ralph/test-ui.sh test-list

# Test logs page (automated)
.agents/ralph/test-ui.sh test-logs

# Interactive mode (opens headed browser)
.agents/ralph/test-ui.sh interactive

# Clean up browser session
.agents/ralph/test-ui.sh cleanup

# Custom UI URL
UI_URL=http://localhost:8080 .agents/ralph/test-ui.sh snapshot

UI Server Configuration

The Ralph UI server uses RALPH_ROOT environment variable:

Production mode (default):

# Uses parent directory's .ralph/ (ralph-cli/.ralph)
cd ui && npm run dev

Test mode:

# Uses ui/.ralph/ for isolated testing
cd ui && npm run dev:test

Custom RALPH_ROOT:

# Point to any .ralph directory
RALPH_ROOT=/path/to/.ralph npm run dev

Writing New Tests

Choosing the Right Location

Always use /tests directory.

Choosing the Right Extension

.mjs for integration/E2E tests
.js for unit tests

Use Descriptive Names

Integration: integration-feature-name.mjs
Unit: test-component-name.js
E2E: e2e-workflow-name.mjs

Update package.json

If adding new npm scripts, update package.json:

{
  "scripts": {
    "test:my-feature": "node tests/integration-my-feature.mjs"
  }
}

Document Complex Scenarios

Add comments for complex test logic:

// Test PRD status detection with direct-to-main workflow
// This verifies git commits are used as source of truth, not checkboxes
test('detects completed PRDs via git log', async () => {
  // ... test logic
});

Best Practices

General Testing Principles

Isolation - Tests should not depend on each other
Cleanup - Clean up any created files/state after tests
Fast - Keep unit tests fast; use mocks when possible
Descriptive - Use clear test names and assertions
Maintainable - Keep tests simple and focused
Documented - Add comments for complex test logic

UI Testing Best Practices

Always snapshot first - Use agent-browser snapshot -i to see page state
Use semantic selectors - Prefer button:has-text('Start') over brittle @eN refs
Add delays - Give dynamic content time to load (sleep 2 or wait-for)
Check console errors - Run agent-browser errors after interactions
Take screenshots - Visual evidence: agent-browser screenshot test.png
Test unhappy paths - Try invalid inputs, missing data, error states
Verify state changes - Check text/attributes after actions
Clean up - Close browser sessions when done
Use scripts - Automate repetitive tests
Document findings - Save screenshots and error logs

Common Pitfalls