Implement test suite with proper Claude API test separation

# Test Suite with Smart Claude API Testing Strategy

## Overview
Implement a comprehensive test suite that properly separates tests by Claude API dependency, ensuring fast CI/CD execution while maintaining thorough coverage of AI-powered analysis tools.

## Core Challenge: Claude API Testing

### Problems with Naive Approach
- ❌ **GitHub Actions**: No access to ANTHROPIC_API_KEY
- ❌ **Token costs**: Real Claude calls expensive (~$0.01-0.10 per test)
- ❌ **Speed**: Claude API adds 5-30s per test call
- ❌ **Non-determinism**: API responses vary, causing flaky tests
- ❌ **Rate limits**: Frequent API calls may hit quotas

### Smart Testing Architecture Solution

## Test Categories by Claude Dependency

### 1. **Fast Tests** (No Claude API) - `tests/fast/`
```bash
# Run on every commit, GitHub Actions friendly
tests/fast/
├── unit/
│   ├── test_ks_env_functions.sh      # ks_validate_days, ks_collect_files
│   ├── test_input_validation.sh      # Parameter sanitization  
│   └── test_file_processing.sh       # JSONL parsing, file collection
├── integration/
│   ├── test_capture_tools.sh         # events, query (no analysis)
│   ├── test_process_tools.sh         # rotate-logs, validate-jsonl
│   └── test_error_handling.sh        # Malformed data, permissions
└── security/
    ├── test_injection_prevention.sh  # Input sanitization
    └── test_path_validation.sh       # File path security

# Execution time: <30 seconds total
# Coverage: ~70% of codebase (all non-AI functionality)
```

### 2. **Mocked Tests** (Fake Claude API) - `tests/mocked/`
```bash
# GitHub Actions friendly, consistent results
tests/mocked/
├── fixtures/
│   ├── claude_responses/
│   │   ├── themes_response_sample1.json
│   │   ├── connections_response_sample1.json
│   │   └── malformed_response.json
│   └── test_events/
│       ├── minimal_dataset.jsonl     # 5 events, predictable
│       ├── theme_dataset.jsonl       # Events → known themes  
│       └── connection_dataset.jsonl  # Events → known connections
├── test_extract_themes_mocked.sh     # Mock ks_claude() function
├── test_find_connections_mocked.sh   # Predictable responses
└── test_error_scenarios_mocked.sh    # API failures, timeouts

# Mock Implementation:
ks_claude() {
    # Override in test environment
    local prompt="$*"
    case "$prompt" in
        *"extract themes"*) cat tests/mocked/fixtures/claude_responses/themes_response_sample1.json ;;
        *"find connections"*) cat tests/mocked/fixtures/claude_responses/connections_response_sample1.json ;;
        *) echo '{"error": "unmocked prompt"}' ;;
    esac
}

# Execution time: <60 seconds total  
# Coverage: 95% of analysis tool functionality with consistent results
```

### 3. **Real API Tests** (Actual Claude) - `tests/e2e/`
```bash
# Local development only, requires ANTHROPIC_API_KEY
tests/e2e/
├── test_analysis_integration.sh      # Real Claude API calls
├── test_large_dataset_analysis.sh    # Performance with real data
└── test_api_error_handling.sh        # Real API failure scenarios

# Smart optimizations:
# - Minimal datasets (5-10 events max)
# - Cached results to avoid repeated calls
# - Optional --use-cached flag for development

# Execution time: 2-5 minutes (limited Claude calls)
# Coverage: End-to-end validation with real AI
```

## Optimized Test Data Strategy

### Minimal, Predictable Datasets
```bash
# tests/fixtures/minimal_theme_dataset.jsonl (5 events)
{"ts":"2025-01-01T10:00:00Z","type":"thought","topic":"memory","content":"Human memory is associative, not indexed"}
{"ts":"2025-01-01T10:01:00Z","type":"thought","topic":"memory","content":"Computer memory is linear and addressable"}  
{"ts":"2025-01-01T10:02:00Z","type":"connection","topic":"memory-systems","content":"Biological vs digital memory architectures"}
{"ts":"2025-01-01T10:03:00Z","type":"insight","topic":"knowledge-systems","content":"Event sourcing mirrors episodic memory"}
{"ts":"2025-01-01T10:04:00Z","type":"thought","topic":"temporal-meaning","content":"Time shapes knowledge, not just stores it"}

# Expected themes: Memory Systems, Knowledge Architecture, Temporal Meaning
# Designed to produce predictable, testable analysis results
```

### Pre-Generated Claude Responses
```json
// tests/fixtures/claude_responses/themes_minimal_dataset.json
{
  "themes": [
    {
      "name": "Memory System Architecture", 
      "description": "Comparison of biological vs computational memory models",
      "frequency": 3,
      "supporting_quotes": ["Human memory is associative", "Computer memory is linear"]
    },
    {
      "name": "Temporal Knowledge Dynamics",
      "description": "Time as constitutive element of knowledge formation", 
      "frequency": 2,
      "supporting_quotes": ["Event sourcing mirrors episodic memory", "Time shapes knowledge"]
    }
  ]
}
```

## CI/CD Integration Strategy

### GitHub Actions Workflow
```yaml
# .github/workflows/test.yml
name: Knowledge System Tests
on: [push, pull_request]

jobs:
  fast-tests:
    name: Fast Tests (No Claude API)
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup system
        run: ./setup.sh
      - name: Run fast test suite
        run: ./tests/run_fast_tests.sh
        
  mocked-tests:
    name: Mocked Tests (Fake Claude API) 
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup system  
        run: ./setup.sh
      - name: Run mocked analysis tests
        run: ./tests/run_mocked_tests.sh
        
  # Note: No real Claude API tests in CI
  # Those run manually or in nightly builds with secrets
```

### Local Development Workflow
```bash
# Quick development cycle
./tests/run_fast_tests.sh              # 30s, no API calls

# Full validation (requires API key)
export ANTHROPIC_API_KEY="your-key"
./tests/run_all_tests.sh               # 5m, includes real Claude calls

# CI simulation (what GitHub Actions runs)
./tests/run_ci_tests.sh                # 90s, fast + mocked only
```

## Test Framework: bats-core with Smart Mocking

### Mock Function Override
```bash
#\!/usr/bin/env bats

# tests/mocked/test_extract_themes_mocked.sh

setup() {
    export TEST_KS_ROOT=$(mktemp -d)
    export KS_HOT_LOG="$TEST_KS_ROOT/hot.jsonl"
    
    # Override Claude function with mock
    ks_claude() {
        cat "$BATS_TEST_DIRNAME/../fixtures/claude_responses/themes_minimal.json"
    }
    
    # Copy test data
    cp "$BATS_TEST_DIRNAME/../fixtures/minimal_theme_dataset.jsonl" "$KS_HOT_LOG"
    
    source "$BATS_TEST_DIRNAME/../../.ks-env"
}

@test "extract-themes produces expected theme count with mocked Claude" {
    run ./tools/analyze/extract-themes --days 1 --format json
    [ "$status" -eq 0 ]
    
    # Parse and validate expected themes
    theme_count=$(echo "$output" | jq '.themes | length')
    [ "$theme_count" -eq 2 ]
    
    # Validate specific theme names
    [[ "$output" == *"Memory System Architecture"* ]]
    [[ "$output" == *"Temporal Knowledge Dynamics"* ]]
}

@test "extract-themes handles mocked API errors gracefully" {
    # Override with error response
    ks_claude() {
        echo '{"error": "API temporarily unavailable"}'
        return 1
    }
    
    run ./tools/analyze/extract-themes --days 1
    [ "$status" -ne 0 ]
    [[ "$output" == *"Error"* ]]
}
```

## Performance Testing with Mocked Claude

### Benchmark Infrastructure
```bash
# tests/performance/benchmark_with_mocks.sh

# Test jq optimization performance without Claude API overhead
benchmark_file_processing() {
    local event_count=$1
    
    # Generate test dataset
    generate_test_events $event_count > "$TEST_KS_ROOT/large.jsonl"
    
    # Mock Claude to return instantly
    ks_claude() { echo '{"themes":[]}'; }
    
    # Measure pure file processing performance
    time ./tools/analyze/extract-themes --days 1 --format json
}

# Results show actual optimization impact without API latency
```

## Token Cost Optimization

### Smart E2E Testing
```bash
# tests/e2e/test_with_minimal_claude_usage.sh

# Cache Claude responses to avoid repeated calls
CLAUDE_CACHE_DIR="$HOME/.ks-test-cache"

cached_claude() {
    local cache_key=$(echo "$*" | sha256sum | cut -d' ' -f1)
    local cache_file="$CLAUDE_CACHE_DIR/$cache_key"
    
    if [ -f "$cache_file" ]; then
        cat "$cache_file"
    else
        # Real Claude call - cache the result
        mkdir -p "$CLAUDE_CACHE_DIR"
        claude "$@" | tee "$cache_file"
    fi
}

# Development workflow:
# 1. First run uses real Claude API (builds cache)
# 2. Subsequent runs use cached responses (free + fast)
# 3. --refresh-cache flag forces real API calls when needed
```

## Implementation Phases

### Phase 1: Fast Test Foundation (1 day)
- [x] Set up bats-core testing framework
- [x] Implement all fast tests (no Claude API)
- [x] GitHub Actions integration for fast tests
- [x] Test data fixtures and generators

### Phase 2: Mocked Analysis Tests (1 day)  
- [x] Create Claude response fixtures
- [x] Implement `ks_claude()` mocking system
- [x] Mocked tests for extract-themes, find-connections
- [x] Error scenario testing with mocked failures

### Phase 3: Smart E2E Testing (1 day)
- [x] Caching system for real Claude responses  
- [x] Minimal dataset E2E tests
- [x] Local-only test runner with API key checks
- [x] Performance testing with cache optimization

## Success Criteria

### Coverage Targets
- **Fast tests**: 70% coverage, <30s execution, GitHub Actions ready
- **Mocked tests**: 95% analysis functionality, <60s execution  
- **E2E tests**: Real Claude validation, <5 API calls total

### Cost Management
- **Zero tokens spent in CI/CD** (fast + mocked tests only)
- **<$0.50 per full E2E test run** (minimal, cached Claude usage)
- **Cached responses** prevent repeated token costs in development

### Developer Experience
- **Fast feedback loop**: 30s for quick validation
- **Complete validation**: 5m for full test suite with real Claude
- **CI/CD friendly**: No secrets required for most tests

This architecture ensures robust testing without breaking the bank on API costs or slowing down development velocity.

Implement test suite with proper Claude API test separation #15

Description

Test Suite with Smart Claude API Testing Strategy

Overview

Core Challenge: Claude API Testing

Problems with Naive Approach

Smart Testing Architecture Solution

Test Categories by Claude Dependency

1. Fast Tests (No Claude API) - tests/fast/

2. Mocked Tests (Fake Claude API) - tests/mocked/

3. Real API Tests (Actual Claude) - tests/e2e/

Optimized Test Data Strategy

Minimal, Predictable Datasets

Pre-Generated Claude Responses

CI/CD Integration Strategy

GitHub Actions Workflow

Local Development Workflow

Test Framework: bats-core with Smart Mocking

Mock Function Override

Performance Testing with Mocked Claude

Benchmark Infrastructure

Token Cost Optimization

Smart E2E Testing

Implementation Phases

Phase 1: Fast Test Foundation (1 day)

Phase 2: Mocked Analysis Tests (1 day)

Phase 3: Smart E2E Testing (1 day)

Success Criteria

Coverage Targets

Cost Management

Developer Experience

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Fast Tests (No Claude API) - `tests/fast/`

2. Mocked Tests (Fake Claude API) - `tests/mocked/`

3. Real API Tests (Actual Claude) - `tests/e2e/`