Skip to content

[FEATURE] #880 - Unit and integration tests for batch-level learning and strategy memory #915

@stephanj

Description

@stephanj

Parent Issue

Part of #880 — Introduce Batch-Level Learning and Strategy Memory

Task

Comprehensive test coverage for the batch-level learning and strategy memory feature.

Test Cases

TaskExecutionMetrics Tests

  • Metrics captured for successful task (timing, tokens, cost)
  • Metrics captured for skipped task (reason, zero tokens)
  • Metrics captured for failed task (error summary, exit code)
  • Timer accurate within reasonable tolerance (±100ms)
  • CLI mode captures execution time from process lifecycle
  • LLM mode extracts token usage from ChatMessageContext

BatchExecutionMemory Tests

  • Record and retrieve single task learning entry
  • Retrieve entries for specific dependency task IDs
  • getSummarizedContext() respects max character limit
  • Older entries compressed to one-liners
  • Recent entries include full details
  • Failed tasks always included regardless of age
  • clear() empties all entries
  • Thread-safe concurrent record/read

Context Injection Tests

  • Learning context injected when LearningMode is BATCH
  • No learning context when LearningMode is OFF
  • Context within character budget
  • Pattern observations included in context
  • Both CLI and LLM modes receive learning context

BatchRunSummary Tests

  • Summary aggregates all per-task metrics correctly
  • Status breakdown counts match individual task statuses
  • Total tokens = sum of per-task tokens
  • Total cost = sum of per-task costs
  • Console output formatted correctly

ExecutionPatternExtractor Tests

  • COMMON_ERROR detected when 2+ tasks fail with same error
  • FILE_HOTSPOT detected when 3+ tasks modify same file
  • PERFORMANCE_TREND detected when tasks progressively slow down
  • No patterns emitted for single-task run
  • Patterns are concise and under 200 chars each
  • Confidence values are reasonable (0.0-1.0)

Integration Tests

  • Full batch run with LearningMode=BATCH: each task receives prior context
  • Full batch run with LearningMode=OFF: no cross-task context (regression)
  • 5-task batch: verify growing learning context per task
  • Learning context does not exceed max limit even with many tasks

Files to Create

  • src/test/java/com/devoxx/genie/model/spec/TaskExecutionMetricsTest.java (new)
  • src/test/java/com/devoxx/genie/service/spec/BatchExecutionMemoryTest.java (new)
  • src/test/java/com/devoxx/genie/service/spec/ExecutionPatternExtractorTest.java (new)
  • src/test/java/com/devoxx/genie/service/spec/BatchLearningIntegrationTest.java (new)

Dependencies

Acceptance Criteria

  • All test cases above pass
  • No flaky tests (mock timers, use deterministic data)
  • Existing tests unaffected
  • OFF mode regression tests confirm zero behavior change

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions