This roadmap outlines the implementation plan for the memory-enhanced multi-agent fundamental analysis system. The system will be built iteratively across 5 phases, with each phase delivering incremental value while building toward full autonomous operation with institutional memory.
The implementation follows an agile methodology with continuous integration of memory features, learning systems, and human validation mechanisms to ensure the system learns correctly and avoids confirmation bias.
- Establish core data infrastructure
- Deploy initial screening capabilities
- Implement central knowledge graph foundation
- Create basic memory storage and retrieval
- Set up data infrastructure
- PostgreSQL for structured data
- S3 (MinIO/AWS) for raw documents (10-15TB)
- Elasticsearch 8+ for Unified Hybrid Search (Text + Vectors) [DD-027]
- Elasticsearch Index Mapping Standard [DD-029]
- Implement data collector agent
- SEC EDGAR integration
- Financial data API connections
- Document parsing pipeline
- Data validation and quality checks
- Build screening agent
- Quantitative filter engine (10Y CAGR, margins, debt ratios)
- Company summary generation
- Candidate prioritization logic
- Initial screening patterns
- Deploy central knowledge graph (Neo4j)
- Node schema (Company, Analysis, Pattern, Decision, Agent)
- Relationship definitions
- Query optimization
- Backup and versioning
- Implement basic memory storage
- Tiered Redis Architecture (L1 Working, L2 Cache, Checkpoint) [DD-028]
- Knowledge graph persistence
- Memory versioning system
- Initial data population
- Create basic dashboard with memory views
- Pipeline overview
- Pending decisions queue
- Memory statistics dashboard
- Historical context viewer
- Data collection operational for initial company set
- Screening agent identifying viable candidates consistently
- Knowledge graph storing basic company relationships
- Dashboard displays current pipeline state
- Deploy specialist analysis agents with memory capabilities
- Implement knowledge base management
- Enable memory-enhanced collaboration
- Establish human oversight gates with historical context
- Deploy financial analyst agent with local memory
- Financial statement analysis
- Ratio calculation and benchmarking
- Red flag detection
- Local L2 cache (Redis L2) for ratio patterns
- Prediction tracking structure
- Deploy business research agent with pattern matching
- SEC filing analysis (10-K, 10-Q, 8-K)
- SWOT framework
- Competitive advantage assessment
- Business model pattern recognition
- Management track record database
-
Deploy Macro Analyst Agent
- Reuse DD-008 regime detection (6 market regimes)
- Economic indicator analysis (FRED: GDP, CPI, rates, VIX)
- Sector favorability scoring (11 GICS sectors, 0-100 scale)
- Monthly report generation (8-12 pages PDF + interactive dashboard)
- Integration with Screening Agent (sector scores)
- Integration with Valuation Agent (discount rates)
- See DD-022
-
Macro Data Infrastructure
- FRED API setup (free, register for key at fred.stlouisfed.org)
- IMF/OECD API integration (free, no keys required)
- CBOE VIX data fetching
- Neo4j schema for macro memory (regime history, forecast accuracy)
- Monthly batch processing (2-4 hours report generation)
- Dashboard deployment (Plotly Dash / Streamlit / Tableau - tool TBD)
-
Macro Report System
- PDF report generation (8-12 pages, 6 sections)
- Interactive dashboard (4 quadrants: regime gauge, indicator heatmap, sector rotation, yield curve)
- API endpoints (5 RESTful endpoints for agent consumption)
- Gate integration (embed dashboards in Gate 1/2/5 UIs)
- See DD-026
- Implement Knowledge Base Agent
- Pattern discovery engine
- Agent track record calculation
- Precedent search and matching
- Lessons learned indexing
- Post-mortem analysis automation
- Add memory synchronization layer
- L1 ↔ L2 sync protocol
- L2 ↔ L3 sync protocol
- Conflict resolution
- Importance-based propagation
- Broadcast mechanism for urgent insights
- Implement pause/resume infrastructure (DD-012 design complete, 4 weeks implementation)
- Components: PauseManager, DependencyResolver, BatchManager
- Database: paused_analyses, batch_pause_operations, resume_plans tables
- Integration: Orchestrator adapter, checkpoint system (DD-011), alert system, L1 memory TTL extension (24h → 14d during pause)
- Dependencies:
- DD-011 checkpoint system (prerequisite for pause/resume)
- Alert system (Flaw #24) for pause notifications
- Implement L1 Memory Durability (DD-016)
- Objective: Enable reliable multi-day pauses without work loss
- Dual-Layer System: Redis secondary snapshots (active) + PostgreSQL checkpoints (on-pause)
- Components: L1TTLManager (dynamic TTL 24h/14d), L1CacheSnapshotter, L1CacheRestorer
- Verification: ConsistencyVerifier (hash-based) ensures restore integrity
- Integration: Triggered by PauseManager and CheckpointSystem
- Implement Failure Correlation System (DD-017)
- Objective: Automate detection of shared root causes (e.g., API quotas) to prevent duplicate alerts
- Components: FailureCorrelator, ErrorSignatureGenerator, RootCauseInference
- Logic: Temporal clustering (5min window), signature matching, auto-batch pause trigger
- Integration: Connects with AlertManager (DD-015) and BatchManager (DD-012)
- Implement memory-enhanced debate facilitator with tiered escalation
- Structured debate protocols (5-level escalation)
- Historical context loading
- Credibility-weighted auto-resolution (Level 2, dynamic >0.25 threshold)
- Phase 2 Simple Credibility (DD-008): Overall accuracy + temporal decay (2yr half-life) + Wilson confidence intervals
- Min 15 historical datapoints per agent
- Dynamic threshold:
max(0.25, CI_A + CI_B)adjusts for statistical uncertainty - No regime/trend/override/context adjustments (Phase 4 enhancements)
- Conservative default fallback logic (Level 4)
- Provisional resolution tracking and override mechanisms
- Precedent-based arguments
- Challenge-response tracking
- Workload-aware debate routing (max 3 concurrent per expert)
- Priority-based queue management (critical-path, valuation, supporting)
- Timeout enforcement at each escalation level
- Pause/resume integration: Provisional resolutions tracked for Gate review; Gate timeout triggers pause via PauseManager
- Add human gate system (Gates 1-5) with historical context
- Gate 1: Screening validation with past outcomes
- Gate 2: Research direction with precedents
- Gate 3: Assumption validation with calibration
- Gate 4: Debate arbitration with track records
- Gate 5: Final decision with full context
- Notification system
- Timeout and default actions
- Decision logging
- Implement Gate 6: Learning Validation with anti-bias mechanisms
- Pattern validation queue
- Hold-out test framework
- Blind testing infrastructure
- Control group A/B testing
- Statistical significance testing
- Human review interface
- Pattern approval workflow
To prevent deadlocks and ensure proper integration, Phase 2 components implement in this order:
Early Phase 2:
- Week 1-2: Alert system foundation (Flaw #24) - prerequisite for both pause/resume and debate escalation
- Week 3-4: Begin pause/resume infrastructure (PauseManager, database tables)
Late Phase 2:
- Week 1-2: Complete pause/resume (DependencyResolver, BatchManager, orchestrator integration)
- Week 3-4: Debate resolution system with pause/resume integration
- Provisional resolutions integrate with pause/resume for Gate review
- Gate timeout triggers pause via PauseManager
- Conservative defaults work with pause state (provisional review at Gates 3/5)
- 3 specialist agents operational with memory
- Knowledge Base Agent indexing initial analysis set
- Memory sync maintaining <100ms L1, <1s L3 latency
- Human gates processing analyses with historical context
- Gate 6 preventing >90% of false patterns from propagation
- Debate resolution system preventing pipeline deadlocks (zero blocking debates)
- Fallback resolutions tested in all scenarios (human unavailable, concurrent overload, queue full)
- Conservative defaults maintaining >85% accuracy in testing
- Pause/resume + debate integration tested (provisional review at gates, gate timeout triggers, paused state with conservative defaults)
Critical testing required for debate deadlock resolution system:
-
Human Unavailability Testing:
- Simulate off-hours debate escalation (Friday 6pm)
- Verify conservative default applied after 6hr timeout
- Confirm pipeline continues with provisional resolution
- Test override at next gate (Gate 3 or Gate 5)
- Measure: Zero pipeline blocks, provisional review completion
-
Concurrent Debate Overload:
- Simulate 5+ debates escalating simultaneously
- Verify queue limit enforcement (max 3 concurrent)
- Test overflow handling (auto-resolve or defer based on priority)
- Confirm critical-path debates force-resolved, supporting debates deferred
- Measure: Queue never exceeds 3, appropriate routing
-
Queue Priority Management:
- Test priority classification (critical-path, valuation, supporting)
- Verify critical-path debates bypass queue when full
- Test valuation debates queue correctly
- Confirm supporting debates defer to next gate
- Measure: >85% priority classification accuracy
-
Credibility-Weighted Auto-Resolution:
- Test with credibility differentials: 0.20, 0.25, 0.30, 0.35, 0.40
- Verify auto-resolution triggers at >0.25 threshold (dynamic: max(0.25, CI_A + CI_B))
- Confirm escalation to human when <0.25
- Test minimum data requirement (15 historical points)
- Measure: Auto-resolution accuracy >80%
-
Conservative Default Logic:
- Test identification of most cautious position
- Verify provisional marking and override window setup
- Test downstream impact analysis on override
- Confirm learning capture (fallback accuracy tracking)
- Measure: Conservative defaults >85% correct (simulated outcomes)
-
Provisional Resolution Override:
- Test review display at Gates 3 and 5
- Verify override triggers re-calculation
- Test downstream dependency propagation
- Confirm override rationale capture
- Measure: All provisional decisions reviewed before final
-
Timeout Enforcement:
- Test 15min agent acknowledgment timeout
- Test 1hr evidence provision timeout
- Test 6hr human arbitration timeout
- Test 24hr fallback timeout (until next gate)
- Measure: All timeouts enforced within ±5 seconds
-
Workload Management:
- Test load indicator display (0/3, 2/3, 3/3)
- Verify queue position tracking
- Test notification on debate assignment
- Confirm overflow prevention mechanisms
- Measure: Expert workload never exceeds 3 concurrent
Testing Timeline: Phase 2 (parallel with agent deployment) Testing Coverage Target: 100% of failure scenarios Success Gate: Zero pipeline deadlocks in 50 test cycles
- Implement sophisticated pattern learning with validation
- Enable outcome tracking and accuracy measurement
- Build agent self-improvement loops
- Deploy remaining specialist agents
- Implement pattern learning system with hold-out validation
- 70/15/15 train/validation/test split
- Pattern mining on training set only
- Hold-out validation before queuing for Gate 6
- Statistical significance testing (p < 0.05)
- Causal mechanism validation
- Pattern lifecycle management
- Add outcome tracking mechanisms
- Checkpoint scheduling (30/90/180/365 days)
- Actual vs. predicted comparison
- Accuracy scoring by agent and domain
- Surprise event logging
- Lesson extraction pipeline
- Build agent self-improvement loops
- Systematic error identification
- Bias correction factor generation
- Historical backtesting of corrections
- Learning commitment and rollback
- Cross-agent learning propagation
- Implement blind testing & control group frameworks
- Pattern performance tracking without agent awareness
- A/B test infrastructure (pattern vs. baseline)
- Statistical power calculation
- Result analysis and reporting
- Automatic pattern retirement on failure
- Enhance Regime Detection with LLM Layer
- Add LLM analysis to DD-008 rule-based regime detection
- Hybrid: Rule-based (60%) + LLM narrative analysis (40%)
- LLM outputs: Classification + confidence + narrative rationale
- Human validation via Gate 6 (learning validation)
- Target: 90% accuracy (vs 85% rule-based)
- Fallback: LLM confidence <0.7 → use rule-based
- Benchmark baseline memory performance
- Graph query latency at beta scale (150-500 analyses)
- Pattern matching time with 500-1K patterns
- Agent credibility calculation performance
- Cache effectiveness measurement
- Document actual vs projected bottlenecks
- Implement tiered caching infrastructure
- System-wide L1 cache (hot, <10ms) - tech research required
- Query result caching with TTL management
- Cache warming before analysis start
- Cache invalidation strategy
- Monitor cache hit rate (target >80%)
- Deploy query optimization & indexing
- Pre-computed similarity indexes (offline batch)
- Nightly/weekly index rebuild pipeline
- Materialized views (patterns, credibility, peers)
- Incremental index updates during day
- Validate 10-15× query speedup
- Implement query budget enforcement
- 500ms hard timeout on memory queries
- Fallback strategies (approximate, cached, sampled results)
- Timeout monitoring and alerting (target <5%)
- Slow query logging for investigation
- Implement query fallback & recursion protection
MAX_FALLBACK_DEPTHenforcement (prevent infinite recursion)- Timeout reduction logic (500ms → 200ms → 100ms)
- Ultimate fallback safety (empty result + warning alert)
- Implement sync backpressure system
- Queue depth monitoring with exponential backoff
- Priority-based eviction (drop normal/high before critical)
- Protection against queue overflow during high-load debates
- Implement regime detection sequencing
- Hybrid parallelism (parallel across regimes, serial within)
- Dependency flags to prevent stale credibility updates
- Staleness SLA enforcement (<5min)
- Implement Fine-Grained RBAC
- 5 Roles: Agent, Knowledge Base Agent, Debate Facilitator, Learning Engine, Human Admin
- Permission Matrix: L1/L2/L3 read/write granularity (e.g.,
read_write_ownvsread_all) - API Authorization Gateway (<5ms overhead)
- Implement Memory Audit Logging
- PostgreSQL-based immutable log
- Track all actor actions, resource modifications, and access denials
- 5-year retention policy for regulatory compliance
- Enforce Pattern Lifecycle & Credibility Protection
- Only Knowledge Base Agent/Humans can validate patterns
- Only Learning Engine can update credibility scores
- Add strategy analyst with track record system
- Capital allocation analysis
- Management track record evaluation
- M&A success rate tracking
- Strategic pivot pattern recognition
- Guidance accuracy scoring
- Build valuation agent with model calibration
- DCF modeling with scenarios
- Relative valuation (multiples)
- Historical model accuracy tracking
- Calibration factor application
- Sector-specific discount rates
- Terminal value assumption validation
- Create memory-aware report writer
- Synthesis of multi-agent findings
- Confidence score integration
- Historical precedent inclusion
- Pattern match highlighting
- Track record display
- Investment memo generation
- Pattern discovery identifying validated patterns consistently
- Outcome tracking active for prediction set
- Agent accuracy improving measurably per quarter
- Strategy and valuation agents operational with memory
- Report writer generating memos with historical context
- <10% false pattern rate (spurious correlations rejected)
- Memory retrieval <200ms cached, <500ms uncached (p95) at operational scale
- Cache hit rate >80%
- Query timeout rate <5%
- Optimize memory retrieval and storage performance
- Implement advanced pattern mining techniques
- Add predictive confidence scoring
- Prepare system for production deployment
- Advanced memory scalability optimizations (DD-005)
- Implement incremental credibility updates (800ms → <10ms)
- Phase 4 Comprehensive Credibility (DD-008): Adds regime, trend, override, context matching to Phase 2 simple credibility
- 6 market regimes (BULL_LOW_RATES, BULL_HIGH_RATES, BEAR_HIGH_RATES, BEAR_LOW_RATES, HIGH_VOLATILITY, NORMAL)
- 52-week trend detection (R² > 0.3), override tracking (>20%/40% penalties)
- Multi-dimensional context matching (5 dimensions)
- Deploy parallel query execution (5× speedup)
- Build memory pruning strategy (>2yr archival)
- Implement similarity index rebuild pipeline
- Cold storage archival with summarization
- Active graph size management (<50K nodes)
- Re-benchmark at production scale (600-2K analyses)
- Implement incremental credibility updates (800ms → <10ms)
- Tune memory retrieval algorithms
- Query optimization for graph database
- Index strategy refinement (nightly vs weekly)
- Cache warming strategies optimization
- Similarity search optimization
- Batch query consolidation
- Memory access pattern analysis
- Implement sophisticated pattern mining
- AutoML for pattern discovery
- Multi-variate pattern recognition
- Cross-domain pattern linking
- Temporal pattern evolution tracking
- Regime-dependent pattern activation
- Network analysis for relationships
- Deploy Neo4j Causal Cluster (3 core + 2 replica)
- Week 1-2: Provision infrastructure, configure Raft clustering
- Week 3: Data migration from single instance (maintenance window)
- Week 4: Failover testing, load testing, DNS cutover
- Configure cross-region backup strategy
- Primary: AWS S3 us-west-2
- Secondary: GCP Cloud Storage us-central1 (DR)
- Hourly incremental + daily/weekly/monthly full backups
- Implement cluster monitoring & alerts
- Prometheus metrics for cluster health
- Grafana dashboards (transaction throughput, replication lag)
- Alert routing (core node down, quorum lost, backup failures)
- Test failover scenarios
- Leader failure: <10s election, zero data loss
- Follower failure: writes continue (2/3 quorum)
- Network partition: majority partition continues
- Replica lag: alerts trigger correctly
- Backup restore: <1hr RTO validation
Estimated Effort: 4 weeks Target: Complete before production deployment Success Criteria:
- Cluster operational (3 core + 2 replica)
- Automatic failover <10s (measured)
- 99.95% availability (30-day measurement)
- Zero data loss in all test scenarios
- Backup restoration <1hr RTO
- Implement tiered storage infrastructure (DD-009)
- Set up Hot/Warm/Cold storage tiers (S3 Standard/IA/Glacier or equivalent)
- Build automated tier migration service (age-based triggers)
- Configure storage class transitions
- Test access latency requirements (<10ms Hot, <100ms Warm, <3s Cold)
- Implement cost tracking and monitoring
- Implement Data Tier Management Operations (DD-019)
- Access-Based Re-Promotion: Track file access frequency; auto-promote Warm→Hot on spike
- Graph Integrity Monitoring: Hybrid real-time alerts + hourly checks + daily deep scan
- Automated Repair: Fix orphaned relationships and missing properties automatically
- Disaster Recovery: Point-in-time recovery (PITR) for Neo4j (<1hr RTO)
- Build pattern-aware retention system
- PatternAwareRetention service (checks pattern dependencies before deletion)
- File→Pattern dependency tracking in knowledge graph
- Retention policy enforcement (7yr standard, 10yr critical)
- Dry-run mode for validation before production
- Implement pattern archive system
- PatternArchiveManager service
- Tier 1 archive creation (lightweight, at validation)
- Tier 2 archive creation (full, at investment decision)
- Archive directory structure and index.json management
- Multi-criteria critical pattern scoring (2-of-4 logic)
- Integrate with validation pipeline
- Tier 1 archive trigger on pattern validation (DD-007)
- Tier 2 archive trigger on investment decision
- Knowledge graph schema updates (archive_tier, evidence_refs fields)
- Archive metadata tracking (creation dates, sizes, retention)
- Testing and validation
- Test tier migrations at all thresholds
- Validate pattern-aware retention (don't delete files with dependencies)
- Test archive creation workflows (Tier 1 and Tier 2)
- Verify cost estimates (~$5/mo tiered + $0.22/mo archives)
- Test evidence retrieval from archives (post-mortem scenarios)
- Validate 7-10yr retention periods before first expirations
Estimated Effort: 2 weeks Target: Complete before first 3-year retention expiry Success Criteria:
- All validated patterns have Tier 1 archives
- Critical patterns have Tier 2 archives
- No evidence deletion for files supporting active patterns
- Pattern re-validation succeeds with archived evidence
- Add predictive confidence scoring
- Bayesian confidence intervals
- Historical accuracy-based calibration
- Uncertainty quantification
- Ensemble prediction aggregation
- Monte Carlo simulation for scenarios
- Confidence decay over time
- Optimize human-memory interaction workflows
- Streamlined Gate 6 review interface
- Context-aware notification prioritization
- Quick decision templates
- Historical search optimization
- Human insight capture refinement
- Feedback loop efficiency
- Full system testing with historical backtesting
- 5-year backtest on 100+ stocks
- Out-of-sample validation
- Regime change testing
- Stress scenario testing
- Performance benchmark comparison
- Memory accuracy validation
- Production deployment with learning enabled
- Staged rollout plan
- Monitoring and alerting
- Rollback procedures
- Learning rate controls
- Production-safe defaults
- Incident response procedures
- Memory retrieval <500ms uncached, <200ms cached (p95) at production scale
- Cache hit rate >80%
- Query timeout rate <5%
- Active graph size <50K nodes (with >2yr memories archived)
- Incremental credibility updates <10ms
- Pattern accuracy >70% on validation set
- Confidence scores calibrated within 5% of actual
- System processing production workload reliably
- <24hr analysis turnaround
- Zero critical memory corruption incidents
- Scale to 1000+ stock coverage
- Achieve autonomous operation with minimal human intervention
- Implement automated pattern discovery and validation
- Enable cross-market learning
- Monitor and refine pattern accuracy
- Continuous pattern performance monitoring
- Automated pattern degradation detection
- Regime-change adaptation
- Pattern refresh automation
- Validity condition enforcement
- Sunset policies for outdated patterns
- Implement automated pattern discovery
- Unsupervised pattern mining
- Automatic statistical validation
- Self-service Gate 6 reporting
- Pattern A/B testing automation
- Winner selection and deployment
- Pattern portfolio management
- Add cross-market pattern recognition
- Multi-asset class correlation
- Macro regime pattern detection
- Sector rotation pattern mining
- Global market linkages
- Alternative data integration
- Sentiment pattern extraction
- Scale to 1000+ stocks with full memory
- Horizontal scaling architecture
- Memory partitioning strategy
- Query load balancing
- Archive optimization
- Cost optimization
- Performance monitoring at scale
- Achieve 90%+ prediction accuracy on key metrics
- Model ensemble optimization
- Bias-variance tradeoff tuning
- Feature importance analysis
- Error attribution framework
- Continuous calibration
- Quality control automation
- Large-scale analysis operational continuously
- Human intervention <10% of decisions
- Memory utilization >80% of analyses
- Pattern self-validation >90% accuracy
- Learning rate sustained at 5%+ per quarter
- System autonomously discovers validated patterns consistently
| Milestone | Phase | Success Criteria |
|---|---|---|
| MVP Launch | Phase 2 | • Process initial stocks end-to-end • Memory baseline functional • Gate 6 operational • 3 specialist agents with memory • Knowledge graph storing analyses |
| Beta Release | Phase 3 | • Beta workload analyzed with memory • 80% overall accuracy • Pattern accuracy >70% • <10% false pattern rate • Outcome tracking active • Agent learning loops operational |
| Production Launch | Phase 4 | • Production workload operational • <24hr turnaround time • Learning rate sustained • Validated patterns only in use • Full backtest validation • Human override rate <20% |
| Scale Phase | Phase 5 | • Large-scale coverage operational • Minimal human input required • Memory utilization 80%+ • Pattern self-validation 90%+ • Sustained learning rate 5%+/quarter • Cross-market patterns operational |
- Memory Coverage: Initial company set in knowledge graph
- Pattern Count: Manually seeded patterns
- Retrieval Speed: <1s for L3 queries
- Validation: Manual pattern validation only
- Memory Coverage: Expanded company set with analyses
- Pattern Count: Discovered and validated patterns
- Retrieval Speed: <500ms for L3, <100ms for L2
- Validation: Automated hold-out + blind testing + Gate 6
- Memory Coverage: Large-scale company and analysis coverage
- Pattern Count: Active validated pattern library
- Retrieval Speed: <200ms for L3, <50ms for L2
- Validation: 90%+ self-validation rate
- Data quality issues → Validation layers, manual spot checks
- API rate limits → Caching, multiple provider setup
- Memory corruption → Versioning system, rollback capability
- False pattern propagation → Gate 6 implementation, statistical validation
- Agent overfitting → Hold-out validation, blind testing
- Confirmation bias → Control groups, human expert review
- Scaling issues → Load testing, optimization sprints
- Production incidents → Monitoring, rollback procedures
- Pattern decay → Continuous monitoring, auto-refresh
- Over-reliance on memory → Human override monitoring, quality checks
- Agile sprints: 2-week iterations
- Continuous integration: Daily merges to main branch
- Test-driven development: Unit tests before implementation
- Memory validation first: Statistical rigor before human review
- Code review required for all changes
- 80%+ test coverage requirement
- Performance regression testing
- Memory quality audits monthly
- Human validation at Gate 6 for all learnings
- Architecture decision records (ADRs)
- API documentation (OpenAPI/Swagger)
- Agent behavior specifications
- Pattern validation reports
- Learning system audit trails
- System Design: See
multi_agent_fundamental_analysis_v2.0.mdfor complete architecture - Technical Requirements: See
02-tech-requirements.mdfor infrastructure details - Risk Assessment: See
03-risks-compliance.mdfor comprehensive risk analysis - Terminology: See
04-glossary.mdfor system concepts and definitions - Memory Architecture: See main design doc Section 3 for detailed memory system design
- Agent Specifications: See main design doc Section 4 for individual agent capabilities
- Learning Systems: See main design doc Section 9 for learning loop details
Document Version: 3.0 | Last Updated: 2025-11-19