Skip to content

Implement configuration-driven conversation harness for testing and experiments #24

@durapensa

Description

@durapensa

Configuration-Driven Conversation Harness

Overview

Create a programmable conversation harness that orchestrates automated conversations through the ks command. This infrastructure enables reproducible testing scenarios and experimental dialogues through declarative configuration files.

Motivation

Currently, testing the knowledge system requires manual interaction, and conducting systematic experiments (like Claude-Claude dialogues) lacks proper tooling. A configuration-driven harness provides:

  • Reproducibility: Share and replay exact conversation scenarios
  • Automation: Run test suites and experiments unattended
  • Control: Precise conversation orchestration with assertions
  • Flexibility: Support diverse use cases from testing to research

Design Principles

  1. Configuration as Code: All conversation logic in versionable YAML/JSON files
  2. Minimal Magic: Transparent operation, easy to debug
  3. Composability: Small configs that can be combined
  4. Progressive Enhancement: Start simple, add features as needed

Core Functionality

Configuration Schema

# Minimal example
conversation:
  id: "test-001"
  messages:
    - from: "user"
      content: "What is recursion?"

# Full example with all features
conversation:
  id: "experiment-memory-models"
  description: "Explore memory system comparisons"
  
  participants:
    human:
      id: "researcher"
      type: "human"
    assistant:
      id: "claude-default"
      type: "claude"
      context: "You are helping explore knowledge systems"
  
  setup:
    working_directory: "./experiments/memory"
    environment:
      KS_MODEL: "sonnet"
  
  messages:
    - from: "human"
      content: "How do biological and digital memory systems differ?"
      wait_after_ms: 2000
      
    - from: "assistant"
      capture:
        event_type: "insight"
        topic: "memory-systems"
      expect:
        contains: ["associative", "biological", "digital"]
        event_created: true
        
    - from: "human"
      content: "Can you relate this to event sourcing?"
      
  export:
    transcript: "./results/memory-dialogue.md"
    events: "./results/memory-events.jsonl"

Multi-Party Dialogue Support

# Claude-Claude dialogue configuration
conversation:
  id: "claude-dialogue-emergence-001"
  description: "Two Claudes discuss emergence"
  
  participants:
    scientist:
      id: "claude-scientist"
      type: "claude"
      context: "You study emergence in complex systems from a reductionist perspective"
    philosopher:
      id: "claude-philosopher"  
      type: "claude"
      context: "You explore emergence from a holistic, phenomenological perspective"
  
  dialogue:
    rounds: 10
    starter: "scientist"
    topic: "How does consciousness emerge from neural activity?"
    
    capture_all:
      event_type: "dialogue"
      topic_prefix: "emergence-dialogue"
      
  rate_limit:
    message_delay_ms: 3000  # Prevent API rate issues

Test Assertions

# Testing example with assertions
conversation:
  id: "test-event-capture"
  
  messages:
    - from: "user"
      content: "I think LLMs are like compressed libraries"
      
    - from: "assistant"
      expect:
        # Flexible assertion types
        event:
          type: "thought"
          topic: 
            contains: "llm"
        response:
          contains: ["interesting", "insightful", "thought"]
          not_contains: ["error", "unclear"]
        success: true

Implementation Architecture

tools/conversation/
├── harness                    # Main CLI entry point
├── lib/
│   ├── config_parser.sh      # YAML/JSON parsing and validation
│   ├── conversation.sh       # Core conversation orchestration
│   ├── ks_wrapper.sh        # Interface with ks command
│   ├── assertions.sh        # Test assertion framework
│   ├── export.sh           # Output formatting (MD, JSONL)
│   └── mock_responses.sh   # Mock response system for testing
├── schemas/
│   └── conversation.schema.json  # JSON Schema for validation
└── examples/
    ├── testing/
    │   ├── basic_capture.yaml
    │   ├── error_handling.yaml
    │   └── mock_responses.yaml
    └── experiments/
        ├── claude_dialogue.yaml
        └── concept_exploration.yaml

CLI Interface

# Run single conversation
tools/conversation/harness run config.yaml

# Validate without executing  
tools/conversation/harness validate config.yaml

# Batch execution
tools/conversation/harness batch ./tests/*.yaml --parallel 3

# Run with mock responses (no API calls)
tools/conversation/harness run --mock test.yaml

# Debug mode with verbose output
tools/conversation/harness run --debug dialogue.yaml

Integration Points

1. Knowledge System Integration

  • Executes through ks command for consistent behavior
  • Automatically captures events via existing infrastructure
  • Respects KS_MODEL and other environment settings

2. Testing Framework (#1)

  • Provides foundation for automated test scenarios
  • Assertion system validates expected behaviors
  • Mock responses enable fast, deterministic tests

3. Experimental Framework (#20)

  • Orchestrates Claude-Claude dialogues
  • Captures all exchanges as events
  • Enables systematic exploration of concept spaces

Development Phases

Phase 1: Core Infrastructure (Week 1)

  • Configuration parser with schema validation
  • Basic conversation orchestration via ks
  • Simple assertions (contains, event creation)
  • Transcript and event export
  • 3-5 working examples

Phase 2: Testing Features (Week 2)

Phase 3: Experimental Features (Week 3)

  • Multi-party dialogue orchestration
  • Conversation templates and inheritance
  • Batch execution with parallelism
  • Analysis hooks for experiments

Success Criteria

  1. Functional Requirements

    • Execute YAML-defined conversations via ks
    • Support both human→AI and AI→AI patterns
    • Export clean transcripts and event logs
    • Run assertions for testing scenarios
  2. Quality Attributes

    • Configurations are human-readable and shareable
    • Failures provide clear, actionable error messages
    • Examples demonstrate all major features
    • Documentation explains configuration options
  3. Integration Success

    • Testing team can write automated test cases
    • Researchers can run reproducible experiments
    • No modifications needed to existing ks or event capture

Example: Complete Test Case

# tests/capture_validation.yaml
conversation:
  id: "test-capture-validation"
  description: "Validate event capture with different types"
  
  messages:
    - from: "user"
      content: "Neural networks are universal function approximators"
      
    - from: "assistant"
      expect:
        event:
          type: "thought"
          content_includes: "neural"
          
    - from: "user"
      content: "How does this relate to the Church-Turing thesis?"
      
    - from: "assistant"
      expect:
        any_of:
          - event: {type: "connection"}
          - event: {type: "insight"}
        response_length_gt: 100
        
    - from: "user"
      content: "What questions does this raise?"
      
    - from: "assistant"
      expect:
        event:
          type: "question"
          count_gte: 1
          
  assertions:
    total_events_created: 3
    conversation_complete: true
    
  export:
    transcript: "./test-results/capture-validation.md"
    events: "./test-results/capture-validation.jsonl"
    results: "./test-results/capture-validation-results.json"

Future Considerations (Not in initial scope)

  • WebSocket/streaming support for real-time monitoring
  • Integration with CI/CD pipelines
  • Performance profiling and metrics
  • Alternative model support (Ollama)
  • Conversation state persistence/resume
  • Visual conversation flow editor

This harness provides essential infrastructure for both quality assurance and knowledge discovery while maintaining simplicity and extensibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions