This document contains all environment variable configuration options for VC.
VC uses a tiered AI model strategy to optimize cost and performance:
- Sonnet 4.5 (
claude-sonnet-4-5-20250929): Complex reasoning tasks - Haiku (
claude-3-5-haiku-20241022): Simple, deterministic tasks
Haiku is approximately 75% cheaper than Sonnet:
- Sonnet: $3/$15 per million tokens (input/output)
- Haiku: $0.80/$4 per million tokens (input/output)
By using Haiku for simple operations (30-40% of total AI calls), VC achieves 25%+ overall cost savings while maintaining quality.
Operations using Haiku (simple tasks):
- Cruft detection
- File size monitoring
- Gitignore pattern recommendations
- Commit message generation
Operations using Sonnet (complex/medium tasks):
- Pre-execution assessment
- Post-execution analysis
- Code review and test coverage
- Deduplication detection (medium complexity - requires semantic understanding)
- Discovered issue translation
- Complexity monitoring
Why deduplication uses Sonnet: Deduplication requires nuanced semantic understanding and has a high cost of failure (false negatives create duplicate issues, false positives lose work). While it's not as complex as assessment/analysis, it's more complex than simple pattern matching. Additionally, vc-159 already optimized dedup cost by 80% through batching, making model switching less impactful.
Override model selection with environment variables:
# Override default model (used for complex tasks)
# Default: claude-sonnet-4-5-20250929
export VC_MODEL_DEFAULT="claude-sonnet-4-5-20250929"
# Override simple task model (used for cruft detection, file size, etc.)
# Default: claude-3-5-haiku-20241022
export VC_MODEL_SIMPLE="claude-3-5-haiku-20241022"Use cases for overrides:
-
Testing with cheaper models:
# Use Haiku for everything (save cost during development) export VC_MODEL_DEFAULT="claude-3-5-haiku-20241022" export VC_MODEL_SIMPLE="claude-3-5-haiku-20241022"
-
Testing with premium models:
# Use Opus for everything (maximum quality) export VC_MODEL_DEFAULT="claude-opus-4-20250514" export VC_MODEL_SIMPLE="claude-opus-4-20250514"
-
A/B testing quality:
# Compare Sonnet vs Haiku for simple tasks export VC_MODEL_SIMPLE="claude-sonnet-4-5-20250929"
Phase 2 validation tests (vc-lf8j) verify:
- <5% quality degradation when using Haiku vs Sonnet for simple tasks
- >50% cost savings for simple operations
- Overall 25%+ cost reduction across all AI calls
Run validation tests:
# Quality comparison tests
go test -v ./internal/health -run TestModelQuality
# Cost measurement tests
go test -v ./internal/health -run TestModelCostVC uses AI-powered deduplication to prevent filing duplicate issues. This feature can be tuned via environment variables to balance between avoiding duplicates and avoiding false positives.
The default settings are optimized for performance while maintaining accuracy:
- Confidence threshold: 0.85 (85%) - High confidence required to mark as duplicate
- Lookback window: 7 days - Only compare against issues from the past week
- Max candidates: 25 - Compare against up to 25 recent issues (reduced from 50 for speed)
- Batch size: 50 - Process 50 comparisons per AI call (increased from 10 for efficiency)
- Within-batch dedup: Enabled - Deduplicate within the same batch of discovered issues
- Fail-open: Enabled - File the issue if deduplication fails (prefer duplicates over lost work)
- Include closed issues: Disabled - Only compare against open issues
- Min title length: 10 characters - Skip dedup for very short titles
- Max retries: 2 - Retry AI calls twice on failure
- Request timeout: 30 seconds - Timeout for AI API calls
Performance Impact (vc-159): With 3 discovered issues and default config:
- Old (BatchSize=10, MaxCandidates=50): ~15 AI calls, ~90 seconds
- New (BatchSize=50, MaxCandidates=25): ~3 AI calls, ~18 seconds
- Result: 80% reduction in API calls and deduplication time!
All deduplication settings can be customized via environment variables:
# Confidence threshold (0.0 to 1.0, default: 0.85)
# Higher = more conservative (fewer false positives, more false negatives)
# Lower = more aggressive (more false positives, fewer false negatives)
export VC_DEDUP_CONFIDENCE_THRESHOLD=0.85
# Lookback period in days (default: 7)
# How many days of recent issues to compare against
export VC_DEDUP_LOOKBACK_DAYS=7
# Maximum number of issues to compare against (default: 50)
# Limits AI API costs and processing time
export VC_DEDUP_MAX_CANDIDATES=50
# Batch size for AI calls (default: 10)
# Number of comparisons to send in a single AI API call
export VC_DEDUP_BATCH_SIZE=10
# Enable within-batch deduplication (default: true)
# If multiple discovered issues are duplicates of each other, only keep the first
export VC_DEDUP_WITHIN_BATCH=true
# Fail-open behavior (default: true)
# If true: file the issue anyway when deduplication fails
# If false: return error and block issue creation
export VC_DEDUP_FAIL_OPEN=true
# Include closed issues in comparison (default: false)
# Useful for preventing re-filing of recently closed issues
export VC_DEDUP_INCLUDE_CLOSED=false
# Minimum title length for deduplication (default: 10)
# Very short titles lack semantic meaning for comparison
export VC_DEDUP_MIN_TITLE_LENGTH=10
# Maximum retry attempts (default: 2)
# Number of times to retry AI API calls on failure
export VC_DEDUP_MAX_RETRIES=2
# Request timeout in seconds (default: 30)
# Timeout for individual AI API calls
export VC_DEDUP_TIMEOUT_SECS=30To reduce false positives (issues incorrectly marked as duplicates):
- Increase
VC_DEDUP_CONFIDENCE_THRESHOLDto 0.90 or 0.95 - Decrease
VC_DEDUP_MAX_CANDIDATESto compare against fewer issues - Decrease
VC_DEDUP_LOOKBACK_DAYSto only compare against very recent issues
To reduce false negatives (actual duplicates not caught):
- Decrease
VC_DEDUP_CONFIDENCE_THRESHOLDto 0.75 or 0.80 (use with caution) - Increase
VC_DEDUP_MAX_CANDIDATESto compare against more issues - Increase
VC_DEDUP_LOOKBACK_DAYSto compare against older issues - Enable
VC_DEDUP_INCLUDE_CLOSED=trueto catch recently closed duplicates
To reduce costs:
- Decrease
VC_DEDUP_MAX_CANDIDATESto limit API calls - Decrease
VC_DEDUP_LOOKBACK_DAYSto narrow the search window - Increase
VC_DEDUP_BATCH_SIZEto make fewer API calls (up to 100)
For debugging:
- Set
VC_DEDUP_CONFIDENCE_THRESHOLD=1.0to effectively disable deduplication - Set
VC_DEDUP_MAX_CANDIDATES=0to skip deduplication entirely - Check logs for
[DEDUP]messages showing comparison results
Conservative Configuration (for critical projects where missing work is worse than having duplicates):
export VC_DEDUP_CONFIDENCE_THRESHOLD=0.95 # Very high confidence required
export VC_DEDUP_FAIL_OPEN=true # File on error
export VC_DEDUP_MAX_CANDIDATES=30 # Limited comparisonsAggressive Configuration (for projects with lots of duplicate work being filed):
export VC_DEDUP_CONFIDENCE_THRESHOLD=0.75 # Lower threshold
export VC_DEDUP_LOOKBACK_DAYS=14 # Longer lookback
export VC_DEDUP_MAX_CANDIDATES=100 # More candidates
export VC_DEDUP_INCLUDE_CLOSED=true # Include closed issuesThe executor validates all deduplication settings on startup. Invalid values (out of range, wrong type, etc.) will cause the executor to exit with a clear error message.
Validation checks:
VC_DEDUP_CONFIDENCE_THRESHOLDmust be between 0.0 and 1.0VC_DEDUP_LOOKBACK_DAYSmust be between 1 and 90 daysVC_DEDUP_MAX_CANDIDATESmust be between 0 and 500VC_DEDUP_BATCH_SIZEmust be between 1 and 100VC_DEDUP_MIN_TITLE_LENGTHmust be between 0 and 500VC_DEDUP_MAX_RETRIESmust be between 0 and 10VC_DEDUP_TIMEOUT_SECSmust be between 1 and 300 seconds
See docs/QUERIES.md for queries to monitor deduplication metrics.
VC runs quality gates (test/lint/build) after successful agent execution to ensure code quality. Gate execution has a configurable timeout to prevent indefinite hangs.
VC_QUALITY_GATES_TIMEOUT - Maximum time allowed for all quality gates to complete (default: 5 minutes)
# Default timeout (5 minutes)
export VC_QUALITY_GATES_TIMEOUT=5m
# Longer timeout for large codebases
export VC_QUALITY_GATES_TIMEOUT=10m
# Shorter timeout for fast feedback in tests
export VC_QUALITY_GATES_TIMEOUT=2mFormat: Duration string (e.g., "5m", "300s", "2m30s")
Valid range: 1 second to 60 minutes
When to adjust:
- Increase if gates are timing out on large codebases
- Decrease for faster feedback during development/testing
- Default (5m) is appropriate for most projects
What happens on timeout:
- Gates execution is canceled
- Issue is marked as blocked with timeout error
- Partial gate results are logged for debugging
- Agent work is not committed
For large codebases with slow tests:
export VC_QUALITY_GATES_TIMEOUT=15mFor fast iteration during development:
export VC_QUALITY_GATES_TIMEOUT=3mFor tests (faster feedback):
export VC_QUALITY_GATES_TIMEOUT=1mVC's planning system uses multiple validators to check mission plans. Each validator runs with panic recovery and timeouts to prevent one bad validator from killing the entire validation pipeline.
Default: 30 seconds per validator
# Override per-validator timeout
export VC_VALIDATOR_TIMEOUT=60s # Allow 1 minute per validatorValid values: Any Go duration string (e.g., 10s, 1m, 90s)
Each validator runs with the following protections:
-
Panic Recovery:
- Panics are caught and logged
- Validation continues with other validators
- Panic is reported as a validation error
-
Timeout Protection:
- Each validator gets its own timeout context
- Prevents infinite loops or hangs
- Timeout is reported as a validation error
-
Fault Isolation:
- One validator failure doesn't block others
- All validators run to completion
- Combined error report shows all failures
The following validators run on every mission plan:
- phase_count: Checks phase count is within acceptable range (1-15 phases)
- plan_size: Enforces plan size limits to prevent timeouts (configurable)
- circular_dependencies: Detects circular dependencies in phases
- dependency_references: Validates all dependency IDs reference existing phases
- task_counts: Validates each phase has reasonable task count (1-50 tasks)
- phase_structure_ai: AI-driven validation of phase dependencies and ordering (advisory only)
Note: The AI validator is advisory only and will log warnings but not block validation on failure (e.g., network issues, API errors).
To prevent timeouts during refinement, validation, and approval, VC enforces configurable limits on plan size:
Default Limits:
- Max phases per plan: 20
- Max tasks per phase: 30
- Max total tasks: 600 (computed as max_phases × max_tasks_per_phase)
- Max dependency depth: 10 levels
Environment Variables:
# Maximum number of phases in a mission plan
# Default: 20
export VC_MAX_PLAN_PHASES=20
# Maximum number of tasks per phase
# Default: 30
export VC_MAX_PHASE_TASKS=30
# Maximum dependency depth (longest dependency chain)
# Default: 10
export VC_MAX_DEPENDENCY_DEPTH=10Why These Limits?
- Refinement Timeout Risk: Plans with >30 tasks per phase may exceed the 5-minute refinement timeout
- Validation Hang Risk: Plans with >50 phases may cause cycle detector to hang
- Approval Timeout Risk: Plans with >600 total tasks may exceed database transaction timeout
- Pathological Graphs: Dependency depth >10 suggests overly complex dependency chains
Validation Errors:
# Too many phases
Error: validation failed: plan_size: plan has too many phases (25 > 20 limit); risk of timeout during validation
# Phase with too many tasks
Error: validation failed: plan_size: phase 1 (Setup) has too many tasks (35 > 30 limit); risk of timeout during refinement
# Excessive dependency depth
Error: validation failed: plan_size: plan has excessive dependency depth (12 > 10 limit); risk of pathological dependency graphCustom Limits:
For larger missions, you can increase limits:
# Allow larger plans
export VC_MAX_PLAN_PHASES=30
export VC_MAX_PHASE_TASKS=50
export VC_MAX_DEPENDENCY_DEPTH=15For stricter validation during testing:
# Enforce smaller plans
export VC_MAX_PLAN_PHASES=10
export VC_MAX_PHASE_TASKS=15
export VC_MAX_DEPENDENCY_DEPTH=5Dependency Depth Calculation:
Dependency depth is the longest path from a phase with no dependencies to any phase. For example:
- Linear chain (1 → 2 → 3): depth = 3
- Diamond (1 → 2,3 → 4): depth = 3
- Complex graph (1 → 2 → 3 → 4 → 5): depth = 5
This prevents pathological dependency graphs that could cause performance issues in cycle detection and topological sorting.
# Multiple validator failures
Error: validation failed: phase_count: plan has too many phases (20); consider breaking into multiple missions; task_counts: phase 1 (Setup) has too many tasks (60); break it down further
# Validator timeout
Error: validation failed: circular_dependencies: validator timeout after 30s
# Validator panic
Error: validation failed: phase_structure_ai: validator panic: runtime error: invalid memory address or nil pointer dereferenceDebug Prompts:
# Log full prompts sent to agents (useful for debugging agent behavior)
export VC_DEBUG_PROMPTS=1Debug Events:
# Log JSON event parsing details (tool_use events from Amp --stream-json)
export VC_DEBUG_EVENTS=1Debug Status Changes (vc-n4lx):
# Log all issue status changes with old/new status and actor
# Useful for debugging unexpected status changes (e.g., baseline issues becoming blocked)
export VC_DEBUG_STATUS=1Example output:
[VC_DEBUG_STATUS] 2025-11-06T21:15:32Z: Status change for vc-baseline-test: open → blocked (actor: preflight)
[VC_DEBUG_STATUS] 2025-11-06T21:20:45Z: Status change for vc-baseline-test: blocked → open (actor: preflight-self-healing, reason: Self-healing reopened)
[VC_DEBUG_STATUS] 2025-11-06T21:25:10Z: Status change for vc-abc: in_progress → closed (actor: executor, reason: Completed: gates passed)
ANTHROPIC_API_KEY (Required for AI supervision):
# Required for AI supervision (assessment and analysis)
export ANTHROPIC_API_KEY=your-key-hereWithout this key, the executor will run without AI supervision (warnings will be logged).
AI supervision can be explicitly disabled via config: EnableAISupervision: false
VC intelligently handles Anthropic API quota/rate limit errors (429 responses) by respecting the retry-after duration instead of immediately retrying with exponential backoff.
When the Anthropic API quota is exceeded, the response includes:
- HTTP 429 status code
- Retry-After header or error message like "try again in 12 minutes"
- Without intelligent handling, repeated retries waste attempts and burn through retries
VC classifies errors into types and handles each appropriately:
Error Types:
- QUOTA (429): Wait for
retry-afterduration, then retry - TRANSIENT (5xx): Use exponential backoff (immediate retry with delays)
- AUTH (401/403): Don't retry (auth failures won't succeed)
- INVALID (400/404): Don't retry (malformed requests won't succeed)
- UNKNOWN: Use exponential backoff (conservative approach)
# Maximum time to wait for quota reset (default: 15 minutes)
# If retry-after exceeds this, fail fast instead of waiting indefinitely
export VC_MAX_QUOTA_WAIT=15m
# Examples:
export VC_MAX_QUOTA_WAIT=30m # Wait up to 30 minutes
export VC_MAX_QUOTA_WAIT=5m # Wait maximum 5 minutes
export VC_MAX_QUOTA_WAIT=1h # Wait up to 1 hourWhen quota is exceeded (429 error):
-
Parse retry-after duration from:
Retry-AfterHTTP header (seconds or HTTP-date)X-RateLimit-Resetheader (Unix timestamp)- Error message patterns: "try again in 12 minutes"
-
Check against MaxQuotaWait:
- If
retry-after <= MaxQuotaWait: Wait intelligently, then retry - If
retry-after > MaxQuotaWait: Fail fast with clear error message
- If
-
During wait:
- Log clear message showing wait time and reset time
- Respect context cancellation (allow graceful shutdown)
- Don't burn through retry attempts
-
After wait completes:
- Retry immediately (quota should be reset)
- If still failing, use normal retry logic
Example output:
⚠️ Quota exceeded: API rate limit hit
Retry after: 12m0s (at 14:30:00 UTC)
Attempt: 1/3
Waiting for quota reset...
Quota wait completed, retrying assessment
Quota errors are weighted more heavily in the circuit breaker:
- Regular errors: Count as 1 failure
- Quota errors: Count as 3 failures (trip circuit faster)
This prevents repeatedly hitting rate limits and gives the system time to recover.
VC handles multiple retry-after formats:
HTTP Headers:
Retry-After: 720 # 720 seconds
X-RateLimit-Reset: 1736348400 # Unix timestamp
Error Messages:
"rate limit exceeded, try again in 12 minutes"
"quota exceeded, wait 720 seconds"
"retry_after": 600
Default Fallback: If no retry-after information is found, VC conservatively waits 1 hour (the typical quota reset window).
For overnight/unattended execution:
# Allow longer waits for quota resets
export VC_MAX_QUOTA_WAIT=1hFor interactive development:
# Fail fast on quota errors, don't wait
export VC_MAX_QUOTA_WAIT=1mFor production (recommended):
# Default 15 minutes balances patience with responsiveness
export VC_MAX_QUOTA_WAIT=15mQuota retry works alongside cost budgeting (vc-e3s7):
- Cost budgeting: Proactive limit before hitting API quota
- Quota retry: Reactive handling when quota is actually exceeded
- Together: Complete quota management solution
When both features are enabled:
- Cost budgeting prevents most quota errors (stay under limit)
- Quota retry handles edge cases (concurrent executions, budget estimation errors)
- System stays operational even under quota pressure
- Cost Budgeting (vc-e3s7): Proactive quota management via token limits
- Bootstrap Mode (vc-b027): Fallback mode for quota crisis issues
- Quota Monitoring (vc-7e21): Real-time burn rate tracking
Unit tests:
go test -v ./internal/ai -run "TestClassifyError|TestParseRetryAfter"Manual testing:
# Simulate quota error by reducing MaxQuotaWait to 1 second
export VC_MAX_QUOTA_WAIT=1s
# Run executor until quota is hit
# Should fail fast with clear error messageVerification:
- Quota errors show clear wait time and reset time
- Wait duration respects
VC_MAX_QUOTA_WAIT - Circuit breaker trips faster on quota errors
- No wasted retry attempts during quota wait
VC tracks quota usage in real-time and predicts quota exhaustion before it happens, allowing preventive action instead of reactive crisis management.
Without monitoring, quota exhaustion happens unexpectedly:
- No visibility into burn rate trends
- Can't predict when limits will be hit
- Emergency response when quota is already exhausted
- Lost productivity during quota outages
VC captures usage snapshots every 5 minutes and uses them to:
- Calculate burn rate (tokens/min, cost/min)
- Predict time-to-limit with confidence scoring
- Emit pre-emptive alerts at escalating levels (YELLOW → ORANGE → RED)
- Auto-create crisis issues when exhaustion is imminent
# Enable/disable quota monitoring (default: true)
export VC_ENABLE_QUOTA_MONITORING=true
# How often to capture usage snapshots (default: 5 minutes)
export VC_QUOTA_SNAPSHOT_INTERVAL=5m
# Alert thresholds (time-to-limit that triggers alerts)
export VC_QUOTA_ALERT_YELLOW=30m # Warning: 15-30min to limit
export VC_QUOTA_ALERT_ORANGE=15m # Urgent: 5-15min to limit
export VC_QUOTA_ALERT_RED=5m # Critical: <5min to limit
# Historical data retention (default: 30 days)
export VC_QUOTA_RETENTION_DAYS=30
# Auto-create P0 quota-crisis issues on RED alerts (default: true)
export VC_QUOTA_AUTO_CREATE_CRISIS_ISSUE=trueGREEN (Healthy):
-
30 minutes until limit at current burn rate
- No alerts emitted (normal operation)
YELLOW (Warning):
- 15-30 minutes until limit
- Alert: "Monitor usage, consider reducing AI operations"
- Console warning logged
- Event logged to activity feed
ORANGE (Urgent):
- 5-15 minutes until limit
- Alert: "Urgent - reduce AI operations or risk hitting limit"
- Escalated console warning
- Event logged with URGENT severity
RED (Critical):
- <5 minutes until limit
- Alert: "CRITICAL - quota exhaustion imminent"
- P0
quota-crisisissue auto-created (if enabled) - Enables Bootstrap Mode for minimal-AI fixes (vc-b027)
VC uses linear regression over last 15 minutes of snapshots:
Algorithm:
- Collect snapshots from last 15 minutes (3 snapshots at 5-min intervals)
- Calculate
tokens_per_minuteandcost_per_minutefrom oldest to newest - Project when each limit (tokens, cost) will be reached
- Report whichever limit will be hit first
- Include confidence score (based on sample size, 0.0-1.0)
Confidence scoring:
- 3 snapshots = 0.6 confidence
- 5+ snapshots = 1.0 confidence
- Only alert if confidence >0.5
Every AI operation is logged with full attribution:
- Operation type: assessment, analysis, deduplication, code_review, discovery
- Model used: sonnet, haiku, opus
- Tokens consumed: input + output
- Cost: calculated from token counts
- Duration: milliseconds taken
- Issue: which issue the operation was for
This enables queries like:
- "Which operation types cost the most?"
- "Which issues burn through quota fastest?"
- "Is sonnet or haiku more cost-effective for assessments?"
See docs/QUERIES.md for cost attribution queries.
Quota Retry (vc-5b22):
- Monitoring = proactive (prevent hitting limits)
- Retry = reactive (handle limits gracefully when hit)
- Together = comprehensive quota management
Bootstrap Mode (vc-b027):
- RED alert auto-creates
quota-crisisissue - Bootstrap mode activates (minimal AI usage)
- Crisis can be fixed without exhausting remaining quota
Cost Budgeting (vc-e3s7):
- Budgeting = hard limits (stop at threshold)
- Monitoring = predictive alerts (warn before threshold)
- Together = stay informed while staying under budget
Normal operation:
✓ Quota healthy (45min to limit, 85% confidence)
Approaching limit:
⚠️ Quota approaching limit: ~25 minutes remaining at current burn rate
Burn rate: 3,200 tokens/min ($0.12/min)
Current usage: 75,000/100,000 tokens ($3.75/$5.00)
Recommended: Monitor usage. Consider reducing AI operations or increasing quota limits.
Imminent exhaustion:
🚨 CRITICAL: Quota exhaustion in ~4 minutes at current burn rate
Burn rate: 5,000 tokens/min ($0.18/min)
Current usage: 95,000/100,000 tokens ($4.80/$5.00)
Recommended: IMMEDIATE ACTION REQUIRED: Stop non-essential AI operations. Quota crisis issue will be auto-created.
[Auto-created vc-abc: "Quota crisis imminent: <5min until exhaustion"]
vc_quota_snapshots - Point-in-time usage (every 5 minutes):
- Hourly tokens/cost used
- Total tokens/cost (all-time)
- Budget status (HEALTHY/WARNING/EXCEEDED)
- Issues worked in this window
vc_quota_operations - Individual AI calls:
- Operation type, model, tokens, cost
- Issue attribution
- Duration (for performance analysis)
For high-volume production:
# More frequent snapshots for better predictions
export VC_QUOTA_SNAPSHOT_INTERVAL=2m
# Earlier warnings to allow more reaction time
export VC_QUOTA_ALERT_YELLOW=45m
export VC_QUOTA_ALERT_ORANGE=20m
export VC_QUOTA_ALERT_RED=10mFor development/testing:
# Less frequent snapshots (reduce noise)
export VC_QUOTA_SNAPSHOT_INTERVAL=10m
# Shorter retention (save disk space)
export VC_QUOTA_RETENTION_DAYS=7
# Disable auto-issue creation (manual review)
export VC_QUOTA_AUTO_CREATE_CRISIS_ISSUE=falseFor cost-sensitive environments:
# Aggressive early warnings
export VC_QUOTA_ALERT_YELLOW=50m
export VC_QUOTA_ALERT_ORANGE=30m
export VC_QUOTA_ALERT_RED=15m
# Auto-create crisis issues earlier
export VC_QUOTA_AUTO_CREATE_CRISIS_ISSUE=trueSee docs/QUERIES.md for comprehensive queries including:
- Current burn rate calculation
- Time-to-limit prediction
- Top quota consumers by operation/issue/model
- Budget window analysis
- Cost efficiency metrics
Minimal overhead:
- Snapshot collection: <1ms every 5 minutes
- Burn rate calculation: <5ms (only on snapshots)
- Database writes: Batched, non-blocking
- No impact on AI operations
Old data is automatically cleaned up:
# Default retention: 30 days
# Runs daily as background goroutine
# Cleanup is transactional and batchedManual cleanup (future):
vc cleanup quotas --dry-run # Preview
vc cleanup quotas # Execute- vc-5b22: Intelligent quota retry (reactive handling)
- vc-b027: Bootstrap mode (minimal AI for crisis fixes)
- vc-e3s7: Cost budgeting (proactive limits)
Bootstrap mode enables VC to fix quota-related issues even when AI budget is exhausted, breaking the circular dependency where quota issues need AI supervision but no quota is available.
Without bootstrap mode, quota exhaustion creates a deadlock:
- Quota issues need to be fixed to restore AI budget
- Fixing issues requires AI supervision (assessment, analysis)
- But AI supervision requires available quota
- Result: VC is stuck and cannot self-heal
Bootstrap mode is a degraded execution mode that activates automatically when:
- AI budget is exceeded (cost tracker status =
BudgetExceeded) AND - Issue has
quota-crisislabel OR title contains quota-related keywords
When active, bootstrap mode:
- ✅ Still runs: Agent execution, quality gates (test/lint/build)
- ❌ Skips: AI assessment, AI analysis, discovered issue creation, deduplication
This allows VC to work on quota fixes with minimal AI usage.
# Enable bootstrap mode (default: false, opt-in for safety)
# IMPORTANT: Only enable if you trust VC to work without AI supervision
export VC_ENABLE_BOOTSTRAP_MODE=true
# Labels that trigger bootstrap mode (default: quota-crisis)
# Comma-separated list of labels
export VC_BOOTSTRAP_MODE_LABELS="quota-crisis,budget-fix"
# Title keywords that trigger bootstrap mode (default: quota,budget,cost,API limit)
# Comma-separated list (case-insensitive)
export VC_BOOTSTRAP_MODE_TITLE_KEYWORDS="quota,budget,cost,API limit"Bootstrap mode activates when ALL conditions are met:
VC_ENABLE_BOOTSTRAP_MODE=true- Cost tracker reports
BudgetExceeded - Either:
- Issue has a label matching
VC_BOOTSTRAP_MODE_LABELSOR - Issue title contains any keyword from
VC_BOOTSTRAP_MODE_TITLE_KEYWORDS
- Issue has a label matching
Example scenarios:
✅ Activates:
- Issue: "Fix quota exhaustion in cost tracker" + budget exceeded
- Issue labeled
quota-crisis+ budget exceeded
❌ Doesn't activate:
- Issue: "Fix authentication bug" (not quota-related)
- Issue: "Fix quota exhaustion" but budget not exceeded (not a crisis yet)
- Budget exceeded but bootstrap mode disabled in config
Assessment Phase (Skipped):
- No AI assessment call
- No risk analysis
- No pre-flight checks
- Logs: "Skipping AI assessment (bootstrap mode active)"
Analysis Phase (Skipped):
- No AI analysis call
- No quality validation
- No discovered issue creation
- Logs: "Skipping AI analysis (bootstrap mode active)"
Deduplication (Skipped):
- No AI deduplication calls
- All discovered issues treated as unique
- Risk: May create duplicate issues
- Logs: "Bootstrap mode active - skipping deduplication (risk of duplicates)"
Quality Gates (Still Run):
- Tests must still pass
- Linting must still pass
- Build must still succeed
- No degradation in code quality enforcement
Agent Execution (Still Runs):
- Coding agent executes normally
- Uses separate API key via Amp CLI
- Not affected by VC's AI budget
When bootstrap mode activates, VC emits:
Console Warning:
⚠️ BOOTSTRAP MODE ACTIVATED for vc-123 (reason: budget_exceeded + label:quota-crisis)
Budget status: EXCEEDED (hourly: 105000/100000 tokens, $5.25/$5.00)
⚠️ LIMITED AI SUPERVISION: No assessment, no analysis, no discovered issues
Activity Feed Event:
{
"type": "bootstrap_mode_activated",
"severity": "WARNING",
"issue_id": "vc-123",
"reason": "budget_exceeded + label:quota-crisis",
"budget_status": "EXCEEDED",
"hourly_tokens_used": 105000,
"hourly_tokens_limit": 100000
}Issue Comment:
⚠️ **BOOTSTRAP MODE ACTIVE**
This issue is being executed in bootstrap mode due to quota exhaustion.
**Limitations:**
- No AI assessment (pre-flight checks)
- No AI analysis (quality validation)
- No discovered issue creation (follow-on work)
- No deduplication (risk of duplicates)
**Quality gates still enforce:**
- Tests must pass
- Linting must pass
- Build must succeed
Reason: budget_exceeded + label:quota-crisis
Budget: 105000/100000 tokens used ($5.25/$5.00)Opt-in Required:
- Bootstrap mode disabled by default
- Requires explicit
VC_ENABLE_BOOTSTRAP_MODE=true - Must be consciously enabled to use
Limited Scope:
- Only affects issues with specific labels/keywords
- Normal issues wait for budget reset
- No system-wide AI supervision bypass
Quality Gates Still Apply:
- Tests must pass
- Linting must pass
- Build must succeed
- Code quality is not degraded
Clear Visibility:
- Prominent warnings when activated
- Activity feed event logged
- Issue comment added for audit trail
What You Lose:
-
No AI Assessment:
- No risk analysis
- No pre-flight validation
- No strategic planning
- May miss complex issues
-
No AI Analysis:
- No completion validation
- No quality issue detection
- No punted items tracking
- May mark incomplete work as done
-
No Discovered Issues:
- Follow-on work not automatically filed
- Must manually track remaining tasks
- May lose context for future work
-
No Deduplication:
- May create duplicate issues
- Increases tracker noise
- Requires manual cleanup later
When NOT to Use Bootstrap Mode:
- ❌ Complex architectural changes (need assessment)
- ❌ Production incidents (need full analysis)
- ❌ Issues that typically spawn many discovered issues
- ❌ Any work where AI supervision is critical
Mitigation Strategies:
- Manual review: Review bootstrap mode executions more carefully
- Post-budget sweep: Run deduplication sweep after budget resets
- Follow-up issues: Manually file discovered work after execution
- Label tracking: Add
bootstrap-mode-usedlabel for audit
Quota Monitoring (vc-7e21):
- Monitoring predicts quota exhaustion
- AUTO-creates
quota-crisisissue on RED alert - Bootstrap mode activates for auto-created issue
- Crisis can be fixed before complete exhaustion
Cost Budgeting (vc-e3s7):
- Budgeting blocks AI calls when budget exceeded
- Bootstrap mode works around this for quota issues
- Non-quota issues still respect budget limits
Quota Retry (vc-5b22):
- Retry handles temporary quota errors
- Bootstrap mode handles exhausted budgets
- Together: comprehensive quota crisis handling
Normal operation:
1. Quota monitoring detects high burn rate
2. RED alert issued (<5min to limit)
3. Auto-creates vc-abc: "Quota crisis: reduce burn rate"
4. Label: quota-crisis
5. Executor claims vc-abc
6. Budget now exceeded (from continued use)
7. Bootstrap mode activates (quota-crisis label + budget exceeded)
8. Agent runs with minimal AI supervision
9. Quality gates enforce correctness
10. Issue closed, burn rate reduced
11. Budget resets in new hour
12. Normal operation resumes
Manual quota fix:
# Create quota crisis issue
bd create "Fix quota exhaustion in deduplication" \
-t bug \
-p 0 \
--label quota-crisis
# Enable bootstrap mode
export VC_ENABLE_BOOTSTRAP_MODE=true
# Start executor - will use bootstrap mode for this issue
vc runUnit tests:
# Test bootstrap mode detection
go test -v ./internal/executor -run TestBootstrapMode
# Test AI skipping in bootstrap mode
go test -v ./internal/executor -run TestBootstrapModeSkipsAIManual testing:
# 1. Exhaust AI budget
export VC_COST_MAX_TOKENS_PER_HOUR=100
export VC_COST_MAX_COST_PER_HOUR=0.01
# 2. Enable bootstrap mode
export VC_ENABLE_BOOTSTRAP_MODE=true
# 3. Create quota issue
bd create "Test bootstrap mode" --label quota-crisis -p 0
# 4. Run executor
vc run
# 5. Verify in logs:
# - "BOOTSTRAP MODE ACTIVATED" warning
# - "Skipping AI assessment (bootstrap mode active)"
# - "Skipping AI analysis (bootstrap mode active)"
# - Quality gates still runVerification checklist:
- ✅ Bootstrap mode only activates when budget exceeded + label/keyword match
- ✅ Assessment phase skipped
- ✅ Analysis phase skipped
- ✅ Deduplication skipped
- ✅ Quality gates still run
- ✅ Clear warnings logged
- ✅ Activity feed event emitted
- ✅ Issue comment added
Conservative (recommended):
# Only enable for emergencies
export VC_ENABLE_BOOTSTRAP_MODE=false # Disabled by default
# Only manual quota issues
export VC_BOOTSTRAP_MODE_LABELS="quota-crisis"
# Strict keyword matching
export VC_BOOTSTRAP_MODE_TITLE_KEYWORDS="quota,budget"Aggressive (for self-hosting):
# Always enabled
export VC_ENABLE_BOOTSTRAP_MODE=true
# Broader label matching
export VC_BOOTSTRAP_MODE_LABELS="quota-crisis,budget-fix,cost-emergency"
# More lenient keyword matching
export VC_BOOTSTRAP_MODE_TITLE_KEYWORDS="quota,budget,cost,API,rate limit,exhaustion"- vc-7e21: Quota monitoring (creates quota-crisis issues)
- vc-e3s7: Cost budgeting (enforces limits that trigger bootstrap)
- vc-5b22: Quota retry (handles transient quota errors)
EnableBlockerPriority (Default: true):
VC uses blocker-first prioritization to ensure missions run to completion. Discovered blockers are ALWAYS selected before regular ready work, regardless of priority numbers.
Default behavior (EnableBlockerPriority: true):
- Discovered blockers (label=discovered:blocker) have absolute priority
- A P3 blocker will be selected over a P0 regular task
- Regular work may wait indefinitely if blockers continuously appear
- This is intentional for mission convergence
Disabling blocker priority (EnableBlockerPriority: false):
- All work is prioritized by priority number only
- Blockers and regular work compete equally
- Use this if work starvation becomes a problem
Configuration:
cfg := executor.DefaultConfig()
cfg.EnableBlockerPriority = false // Disable blocker-first prioritizationMonitoring:
- Check blocker discovery rate:
bd list --status open | grep discovered:blocker - Monitor work starvation metrics (see vc-160)
- See CLAUDE.md Workflow section for full prioritization policy
Related issues:
- vc-161: Documentation for blocker prioritization policy
- vc-160: Monitoring work starvation
VC uses a self-healing state machine to recover from baseline quality gate failures (test/lint/build). The self-healing system attempts to fix baseline issues automatically, escalating to humans when thresholds are exceeded.
# Maximum attempts before escalating baseline issues (default: 5)
# After this many failed attempts, the baseline issue is marked no-auto-claim
# and an escalation issue is created for human intervention
export VC_SELF_HEALING_MAX_ATTEMPTS=5
# Maximum duration in self-healing mode before escalating (default: 24h)
# If a baseline issue remains unresolved for this long, it gets escalated
# Format: duration string (e.g., "24h", "48h", "2h30m")
export VC_SELF_HEALING_MAX_DURATION=24h
# How often to recheck baseline in degraded mode (default: 5m)
# When in degraded mode (no baseline work found), this controls how often
# to recheck if the baseline has been fixed by other means
# Format: duration string (e.g., "5m", "10m", "1m")
export VC_SELF_HEALING_RECHECK_INTERVAL=5m
# Enable verbose logging for self-healing decisions (default: true)
# Logs every decision in the fallback chain for observability
# Useful for debugging self-healing behavior
export VC_SELF_HEALING_VERBOSE_LOGGING=trueThe executor uses three states to manage baseline failures:
- HEALTHY: Normal operation, all quality gates passing
- SELF_HEALING: Baseline failed, actively trying to fix it with smart work selection
- ESCALATED: Thresholds exceeded, human intervention needed
When in SELF_HEALING mode, the executor uses this fallback chain:
- Find baseline-failure labeled issues (ready to execute)
- Investigate blocked baseline → claim ready dependents
- Find discovered:blocker issues (ready to execute)
- Log diagnostics if no work found
- Check escalation thresholds
- Fall through to regular work
Escalation happens when EITHER threshold is exceeded:
- Attempt threshold:
VC_SELF_HEALING_MAX_ATTEMPTS(default: 5) - Duration threshold:
VC_SELF_HEALING_MAX_DURATION(default: 24h)
When escalated:
- Baseline issue gets
no-auto-claimlabel (executor stops working on it) - Escalation issue created (P0, urgent, no-auto-claim)
- Executor transitions to ESCALATED mode
- Regular work continues normally
For aggressive self-healing (attempt fixes more times before giving up):
export VC_SELF_HEALING_MAX_ATTEMPTS=10 # Try more times
export VC_SELF_HEALING_MAX_DURATION=48h # Allow more timeFor conservative self-healing (escalate to humans sooner):
export VC_SELF_HEALING_MAX_ATTEMPTS=3 # Escalate sooner
export VC_SELF_HEALING_MAX_DURATION=12h # Shorter time windowFor debugging self-healing behavior:
export VC_SELF_HEALING_VERBOSE_LOGGING=true # Enable detailed logs
export VC_SELF_HEALING_RECHECK_INTERVAL=1m # Check more frequently- vc-210: Self-Healing Baseline Failures (original implementation)
- vc-wlk2: Robust Self-Healing: Graceful Degradation and Smart Fallback (epic)
- vc-23t0: Implement SelfHealingMode state machine
- vc-h8b8: Implement escalation mechanism with thresholds
- vc-tn9c: Add configuration for self-healing thresholds
VC detects when an agent succeeds technically (exit code 0, quality gates pass) but fails to fully complete the work according to acceptance criteria. This can happen when the agent reads files but doesn't make required edits, or only partially completes the task.
# Maximum retries for incomplete work before escalation (default: 1)
# After this many incomplete attempts, the issue is marked needs-human-review
# and blocked to prevent infinite retry loops
export VC_MAX_INCOMPLETE_RETRIES=1When AI analysis reports completed: false but the agent succeeded:
- First attempt: Issue gets a retry comment and stays open for another attempt
- Second attempt (default threshold): Issue is escalated with
needs-human-reviewlabel and marked as blocked
The retry logic counts "Incomplete Work Detected" comments in the event history to track attempts across executions.
For more aggressive retries (give the agent more chances):
export VC_MAX_INCOMPLETE_RETRIES=2 # Allow 2 retries before escalationFor conservative approach (escalate immediately):
export VC_MAX_INCOMPLETE_RETRIES=0 # Escalate on first incomplete attemptDefault recommendation: Keep at 1 retry. Most incomplete work issues are due to fundamental misunderstanding of requirements rather than transient issues, so additional retries rarely help.
- vc-1ows: Handle incomplete work with retry mechanism
- vc-hsfz: Make maxIncompleteRetries configurable
Status: Not yet implemented. Punted until database size becomes a real issue (vc-184, vc-198).
Following the lesson learned from deduplication metrics (vc-151), we're deferring event retention infrastructure until we have real production data showing it's needed. This avoids building observability for theoretical future problems.
Implement event retention when:
.beads/vc.dbexceeds 100MB- Query performance degrades noticeably
- Developers complain about database size
- Event table has >100k rows
Until then: YAGNI (You Aren't Gonna Need It).
When we do implement this, here's the plan:
Retention Policy Tiers:
- Regular events (progress, file_modified, etc.): 30 days
- Critical events (error, watchdog_alert): 180 days
- Per-issue limit: 1000 events max per issue
- Global limit: Configurable, default 50k events
Proposed Environment Variables:
# Event retention in days (default: 30)
export VC_EVENT_RETENTION_DAYS=30
# Critical event retention in days (default: 180)
export VC_EVENT_CRITICAL_RETENTION_DAYS=180
# Per-issue event limit (default: 1000, 0 = unlimited)
export VC_EVENT_PER_ISSUE_LIMIT=1000
# Global event limit (default: 50000, 0 = unlimited)
export VC_EVENT_GLOBAL_LIMIT=50000
# Cleanup frequency in hours (default: 24)
export VC_EVENT_CLEANUP_INTERVAL_HOURS=24
# Batch size for cleanup (default: 1000)
export VC_EVENT_CLEANUP_BATCH_SIZE=1000Cleanup Strategy:
- Run as background goroutine in executor
- Execute every 24 hours (configurable)
- Transaction-based deletion in batches of 1000
- Log cleanup metrics (events deleted, time taken)
CLI Command (Not Yet Implemented):
# Manual cleanup trigger
vc cleanup events --dry-run # Preview what would be deleted
vc cleanup events # Execute cleanup
vc cleanup events --force # Bypass safety checks- vc-183: Agent Events Retention and Cleanup [OPEN - Low Priority]
- vc-184: Design event retention policy [CLOSED - Design complete]
- vc-193 through vc-197: Implementation tasks [OPEN - Punted]
- vc-199: Tests for event retention [OPEN - Punted]
Remember: Build this when you need it, not before. Let real usage drive the requirements.
See docs/QUERIES.md for event retention monitoring queries (for future use).