-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
priority: highCritical for project progressCritical for project progresstype: infrastructureCore system architecture and toolingCore system architecture and tooling
Milestone
Description
Configuration-Driven Conversation Harness
Overview
Create a programmable conversation harness that orchestrates automated conversations through the ks
command. This infrastructure enables reproducible testing scenarios and experimental dialogues through declarative configuration files.
Motivation
Currently, testing the knowledge system requires manual interaction, and conducting systematic experiments (like Claude-Claude dialogues) lacks proper tooling. A configuration-driven harness provides:
- Reproducibility: Share and replay exact conversation scenarios
- Automation: Run test suites and experiments unattended
- Control: Precise conversation orchestration with assertions
- Flexibility: Support diverse use cases from testing to research
Design Principles
- Configuration as Code: All conversation logic in versionable YAML/JSON files
- Minimal Magic: Transparent operation, easy to debug
- Composability: Small configs that can be combined
- Progressive Enhancement: Start simple, add features as needed
Core Functionality
Configuration Schema
# Minimal example
conversation:
id: "test-001"
messages:
- from: "user"
content: "What is recursion?"
# Full example with all features
conversation:
id: "experiment-memory-models"
description: "Explore memory system comparisons"
participants:
human:
id: "researcher"
type: "human"
assistant:
id: "claude-default"
type: "claude"
context: "You are helping explore knowledge systems"
setup:
working_directory: "./experiments/memory"
environment:
KS_MODEL: "sonnet"
messages:
- from: "human"
content: "How do biological and digital memory systems differ?"
wait_after_ms: 2000
- from: "assistant"
capture:
event_type: "insight"
topic: "memory-systems"
expect:
contains: ["associative", "biological", "digital"]
event_created: true
- from: "human"
content: "Can you relate this to event sourcing?"
export:
transcript: "./results/memory-dialogue.md"
events: "./results/memory-events.jsonl"
Multi-Party Dialogue Support
# Claude-Claude dialogue configuration
conversation:
id: "claude-dialogue-emergence-001"
description: "Two Claudes discuss emergence"
participants:
scientist:
id: "claude-scientist"
type: "claude"
context: "You study emergence in complex systems from a reductionist perspective"
philosopher:
id: "claude-philosopher"
type: "claude"
context: "You explore emergence from a holistic, phenomenological perspective"
dialogue:
rounds: 10
starter: "scientist"
topic: "How does consciousness emerge from neural activity?"
capture_all:
event_type: "dialogue"
topic_prefix: "emergence-dialogue"
rate_limit:
message_delay_ms: 3000 # Prevent API rate issues
Test Assertions
# Testing example with assertions
conversation:
id: "test-event-capture"
messages:
- from: "user"
content: "I think LLMs are like compressed libraries"
- from: "assistant"
expect:
# Flexible assertion types
event:
type: "thought"
topic:
contains: "llm"
response:
contains: ["interesting", "insightful", "thought"]
not_contains: ["error", "unclear"]
success: true
Implementation Architecture
tools/conversation/
├── harness # Main CLI entry point
├── lib/
│ ├── config_parser.sh # YAML/JSON parsing and validation
│ ├── conversation.sh # Core conversation orchestration
│ ├── ks_wrapper.sh # Interface with ks command
│ ├── assertions.sh # Test assertion framework
│ ├── export.sh # Output formatting (MD, JSONL)
│ └── mock_responses.sh # Mock response system for testing
├── schemas/
│ └── conversation.schema.json # JSON Schema for validation
└── examples/
├── testing/
│ ├── basic_capture.yaml
│ ├── error_handling.yaml
│ └── mock_responses.yaml
└── experiments/
├── claude_dialogue.yaml
└── concept_exploration.yaml
CLI Interface
# Run single conversation
tools/conversation/harness run config.yaml
# Validate without executing
tools/conversation/harness validate config.yaml
# Batch execution
tools/conversation/harness batch ./tests/*.yaml --parallel 3
# Run with mock responses (no API calls)
tools/conversation/harness run --mock test.yaml
# Debug mode with verbose output
tools/conversation/harness run --debug dialogue.yaml
Integration Points
1. Knowledge System Integration
- Executes through
ks
command for consistent behavior - Automatically captures events via existing infrastructure
- Respects KS_MODEL and other environment settings
2. Testing Framework (#1)
- Provides foundation for automated test scenarios
- Assertion system validates expected behaviors
- Mock responses enable fast, deterministic tests
3. Experimental Framework (#20)
- Orchestrates Claude-Claude dialogues
- Captures all exchanges as events
- Enables systematic exploration of concept spaces
Development Phases
Phase 1: Core Infrastructure (Week 1)
- Configuration parser with schema validation
- Basic conversation orchestration via
ks
- Simple assertions (contains, event creation)
- Transcript and event export
- 3-5 working examples
Phase 2: Testing Features (Week 2)
- Mock response system
- Advanced assertions (patterns, counts)
- Test report generation
- Integration with test suite (Implement test suite with proper Claude API test separation #15)
Phase 3: Experimental Features (Week 3)
- Multi-party dialogue orchestration
- Conversation templates and inheritance
- Batch execution with parallelism
- Analysis hooks for experiments
Success Criteria
-
Functional Requirements
- Execute YAML-defined conversations via
ks
- Support both human→AI and AI→AI patterns
- Export clean transcripts and event logs
- Run assertions for testing scenarios
- Execute YAML-defined conversations via
-
Quality Attributes
- Configurations are human-readable and shareable
- Failures provide clear, actionable error messages
- Examples demonstrate all major features
- Documentation explains configuration options
-
Integration Success
- Testing team can write automated test cases
- Researchers can run reproducible experiments
- No modifications needed to existing
ks
or event capture
Example: Complete Test Case
# tests/capture_validation.yaml
conversation:
id: "test-capture-validation"
description: "Validate event capture with different types"
messages:
- from: "user"
content: "Neural networks are universal function approximators"
- from: "assistant"
expect:
event:
type: "thought"
content_includes: "neural"
- from: "user"
content: "How does this relate to the Church-Turing thesis?"
- from: "assistant"
expect:
any_of:
- event: {type: "connection"}
- event: {type: "insight"}
response_length_gt: 100
- from: "user"
content: "What questions does this raise?"
- from: "assistant"
expect:
event:
type: "question"
count_gte: 1
assertions:
total_events_created: 3
conversation_complete: true
export:
transcript: "./test-results/capture-validation.md"
events: "./test-results/capture-validation.jsonl"
results: "./test-results/capture-validation-results.json"
Future Considerations (Not in initial scope)
- WebSocket/streaming support for real-time monitoring
- Integration with CI/CD pipelines
- Performance profiling and metrics
- Alternative model support (Ollama)
- Conversation state persistence/resume
- Visual conversation flow editor
This harness provides essential infrastructure for both quality assurance and knowledge discovery while maintaining simplicity and extensibility.
Metadata
Metadata
Assignees
Labels
priority: highCritical for project progressCritical for project progresstype: infrastructureCore system architecture and toolingCore system architecture and tooling