Implement configuration-driven conversation harness for testing and experiments

# Configuration-Driven Conversation Harness

## Overview

Create a programmable conversation harness that orchestrates automated conversations through the `ks` command. This infrastructure enables reproducible testing scenarios and experimental dialogues through declarative configuration files.

## Motivation

Currently, testing the knowledge system requires manual interaction, and conducting systematic experiments (like Claude-Claude dialogues) lacks proper tooling. A configuration-driven harness provides:

- **Reproducibility**: Share and replay exact conversation scenarios
- **Automation**: Run test suites and experiments unattended
- **Control**: Precise conversation orchestration with assertions
- **Flexibility**: Support diverse use cases from testing to research

## Design Principles

1. **Configuration as Code**: All conversation logic in versionable YAML/JSON files
2. **Minimal Magic**: Transparent operation, easy to debug
3. **Composability**: Small configs that can be combined
4. **Progressive Enhancement**: Start simple, add features as needed

## Core Functionality

### Configuration Schema

```yaml
# Minimal example
conversation:
  id: "test-001"
  messages:
    - from: "user"
      content: "What is recursion?"

# Full example with all features
conversation:
  id: "experiment-memory-models"
  description: "Explore memory system comparisons"
  
  participants:
    human:
      id: "researcher"
      type: "human"
    assistant:
      id: "claude-default"
      type: "claude"
      context: "You are helping explore knowledge systems"
  
  setup:
    working_directory: "./experiments/memory"
    environment:
      KS_MODEL: "sonnet"
  
  messages:
    - from: "human"
      content: "How do biological and digital memory systems differ?"
      wait_after_ms: 2000
      
    - from: "assistant"
      capture:
        event_type: "insight"
        topic: "memory-systems"
      expect:
        contains: ["associative", "biological", "digital"]
        event_created: true
        
    - from: "human"
      content: "Can you relate this to event sourcing?"
      
  export:
    transcript: "./results/memory-dialogue.md"
    events: "./results/memory-events.jsonl"
```

### Multi-Party Dialogue Support

```yaml
# Claude-Claude dialogue configuration
conversation:
  id: "claude-dialogue-emergence-001"
  description: "Two Claudes discuss emergence"
  
  participants:
    scientist:
      id: "claude-scientist"
      type: "claude"
      context: "You study emergence in complex systems from a reductionist perspective"
    philosopher:
      id: "claude-philosopher"  
      type: "claude"
      context: "You explore emergence from a holistic, phenomenological perspective"
  
  dialogue:
    rounds: 10
    starter: "scientist"
    topic: "How does consciousness emerge from neural activity?"
    
    capture_all:
      event_type: "dialogue"
      topic_prefix: "emergence-dialogue"
      
  rate_limit:
    message_delay_ms: 3000  # Prevent API rate issues
```

### Test Assertions

```yaml
# Testing example with assertions
conversation:
  id: "test-event-capture"
  
  messages:
    - from: "user"
      content: "I think LLMs are like compressed libraries"
      
    - from: "assistant"
      expect:
        # Flexible assertion types
        event:
          type: "thought"
          topic: 
            contains: "llm"
        response:
          contains: ["interesting", "insightful", "thought"]
          not_contains: ["error", "unclear"]
        success: true
```

## Implementation Architecture

```
tools/conversation/
├── harness                    # Main CLI entry point
├── lib/
│   ├── config_parser.sh      # YAML/JSON parsing and validation
│   ├── conversation.sh       # Core conversation orchestration
│   ├── ks_wrapper.sh        # Interface with ks command
│   ├── assertions.sh        # Test assertion framework
│   ├── export.sh           # Output formatting (MD, JSONL)
│   └── mock_responses.sh   # Mock response system for testing
├── schemas/
│   └── conversation.schema.json  # JSON Schema for validation
└── examples/
    ├── testing/
    │   ├── basic_capture.yaml
    │   ├── error_handling.yaml
    │   └── mock_responses.yaml
    └── experiments/
        ├── claude_dialogue.yaml
        └── concept_exploration.yaml
```

### CLI Interface

```bash
# Run single conversation
tools/conversation/harness run config.yaml

# Validate without executing  
tools/conversation/harness validate config.yaml

# Batch execution
tools/conversation/harness batch ./tests/*.yaml --parallel 3

# Run with mock responses (no API calls)
tools/conversation/harness run --mock test.yaml

# Debug mode with verbose output
tools/conversation/harness run --debug dialogue.yaml
```

## Integration Points

### 1. Knowledge System Integration
- Executes through `ks` command for consistent behavior
- Automatically captures events via existing infrastructure
- Respects KS_MODEL and other environment settings

### 2. Testing Framework (#1)
- Provides foundation for automated test scenarios
- Assertion system validates expected behaviors
- Mock responses enable fast, deterministic tests

### 3. Experimental Framework (#20)
- Orchestrates Claude-Claude dialogues
- Captures all exchanges as events
- Enables systematic exploration of concept spaces

## Development Phases

### Phase 1: Core Infrastructure (Week 1)
- [ ] Configuration parser with schema validation
- [ ] Basic conversation orchestration via `ks`
- [ ] Simple assertions (contains, event creation)
- [ ] Transcript and event export
- [ ] 3-5 working examples

### Phase 2: Testing Features (Week 2)
- [ ] Mock response system
- [ ] Advanced assertions (patterns, counts)
- [ ] Test report generation
- [ ] Integration with test suite (#15)

### Phase 3: Experimental Features (Week 3)
- [ ] Multi-party dialogue orchestration
- [ ] Conversation templates and inheritance
- [ ] Batch execution with parallelism
- [ ] Analysis hooks for experiments

## Success Criteria

1. **Functional Requirements**
   - Execute YAML-defined conversations via `ks`
   - Support both human→AI and AI→AI patterns
   - Export clean transcripts and event logs
   - Run assertions for testing scenarios

2. **Quality Attributes**
   - Configurations are human-readable and shareable
   - Failures provide clear, actionable error messages
   - Examples demonstrate all major features
   - Documentation explains configuration options

3. **Integration Success**
   - Testing team can write automated test cases
   - Researchers can run reproducible experiments
   - No modifications needed to existing `ks` or event capture

## Example: Complete Test Case

```yaml
# tests/capture_validation.yaml
conversation:
  id: "test-capture-validation"
  description: "Validate event capture with different types"
  
  messages:
    - from: "user"
      content: "Neural networks are universal function approximators"
      
    - from: "assistant"
      expect:
        event:
          type: "thought"
          content_includes: "neural"
          
    - from: "user"
      content: "How does this relate to the Church-Turing thesis?"
      
    - from: "assistant"
      expect:
        any_of:
          - event: {type: "connection"}
          - event: {type: "insight"}
        response_length_gt: 100
        
    - from: "user"
      content: "What questions does this raise?"
      
    - from: "assistant"
      expect:
        event:
          type: "question"
          count_gte: 1
          
  assertions:
    total_events_created: 3
    conversation_complete: true
    
  export:
    transcript: "./test-results/capture-validation.md"
    events: "./test-results/capture-validation.jsonl"
    results: "./test-results/capture-validation-results.json"
```

## Future Considerations (Not in initial scope)

- WebSocket/streaming support for real-time monitoring
- Integration with CI/CD pipelines
- Performance profiling and metrics
- Alternative model support (Ollama)
- Conversation state persistence/resume
- Visual conversation flow editor

This harness provides essential infrastructure for both quality assurance and knowledge discovery while maintaining simplicity and extensibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement configuration-driven conversation harness for testing and experiments #24

Configuration-Driven Conversation Harness

Overview

Motivation

Design Principles

Core Functionality

Configuration Schema

Multi-Party Dialogue Support

Test Assertions

Implementation Architecture

CLI Interface

Integration Points

1. Knowledge System Integration

2. Testing Framework (#1)

3. Experimental Framework (#20)

Development Phases

Phase 1: Core Infrastructure (Week 1)

Phase 2: Testing Features (Week 2)

Phase 3: Experimental Features (Week 3)

Success Criteria

Example: Complete Test Case

Future Considerations (Not in initial scope)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement configuration-driven conversation harness for testing and experiments #24

Description

Configuration-Driven Conversation Harness

Overview

Motivation

Design Principles

Core Functionality

Configuration Schema

Multi-Party Dialogue Support

Test Assertions

Implementation Architecture

CLI Interface

Integration Points

1. Knowledge System Integration

2. Testing Framework (#1)

3. Experimental Framework (#20)

Development Phases

Phase 1: Core Infrastructure (Week 1)

Phase 2: Testing Features (Week 2)

Phase 3: Experimental Features (Week 3)

Success Criteria

Example: Complete Test Case

Future Considerations (Not in initial scope)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions