Self-improving voice agent framework that learns from expert feedback by extracting important concepts from conversations and validating predictions with a local vector db of similar cases (used for few-shot prompting and confidence assessment) + optionally public knowledge sources. SIVA leverages Sierra's tau2-bench architecture for agent simulation and evaluation.
Requirements: Python 3.8+, uv package manager, browser with microphone access
-
Setup Environment:
# Using uv (automatically handles virtual environment) uv run python --version -
Configure API Keys (
.envfile):OPENAI_API_KEY=sk-your-key-here CARTESIA_API_KEY=your-cartesia-key-here DOMAIN_API_KEY=your-domain-specific-key-here # For domain-specific evidence sources -
Launch the Voice Agent:
uv run python run_voice_app.py
Opens the voice client at http://localhost:3000/voice_client.html and dashboard at http://localhost:8000/dashboard
SIVA includes a comprehensive CLI for running agent simulations and testing different scenarios using the tau2-bench framework.
Run a single patient intake simulation:
uv run python -m siva.cli run --domain patient_intake --agent llm_agent --user user_simulator --num-tasks 1 --max-steps 50Run multiple tasks for comprehensive testing:
uv run python -m siva.cli run --domain patient_intake --agent llm_agent --user user_simulator --num-tasks 3 --max-steps 50Test different agent types: TBD.
After running simulations, view results with:
uv run python -m siva.cli viewExample Output:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Simulation Overview โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Task ID: patient_intake_PI001 โ
โ Trial: 0 โ
โ Duration: 14.20s โ
โ Termination Reason: TerminationReason.AGENT_STOP โ
โ Agent Cost: $0.0218 โ
โ User Cost: $0.0021 โ
โ Reward: โ
1.0000 (ACTION: 1.0) โ
โ โ
โ Action Checks: โ
โ - 0: verify_fullname โ
1.0 โ
โ - 1: verify_birthday โ
1.0 โ
โ - 2: list_prescriptions โ
1.0 โ
โ - 3: list_allergies โ
1.0 โ
โ - 4: list_conditions โ
1.0 โ
โ - 5: list_visit_reasons โ
1.0 โ
โ - 6: determine_routing โ
1.0 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
- Domains:
patient_intake,patient_intake-workflow - Agents:
llm_agent,llm_agent_solo,llm_agent_gt - Users:
user_simulator,dummy_user
Note: SIVA uses the tau2-bench approach with LLM-based user simulators that generate responses dynamically based on task instructions, eliminating the need for hardcoded response logic.
SIVA now includes a modern Next.js dashboard that provides a web interface for running simulations, viewing results, and monitoring the learning system. This is built on top of the tau2-bench framework.
1. Start the tau2-bench Backend:
# Start the new tau2-bench based backend
uv run python main_tau2.pyThe backend will be available at http://localhost:8000
2. Start the Next.js Dashboard:
cd frontend/nextjs
npm run devThe dashboard will be available at http://localhost:3000
3. Access the Dashboard:
- Open
http://localhost:3000in your browser - Use the dashboard to run simulations, view results, and monitor performance
- Overview: Performance metrics and recent simulations
- Simulations: Run new simulations and view results
- Learning: Monitor learning system status and improvements
- Real-time Updates: Background simulation processing with status updates
The new backend provides RESTful APIs:
GET /api/health- System health checkGET /api/domains- Available domains and agentsPOST /api/simulations/run- Start new simulationsGET /api/simulations/status/{id}- Check simulation progressGET /api/learning/summary- Learning system status
- Legacy Backend (
main.py) - Original voice agent functionality - Modern Backend (
main_tau2.py) - New tau2-bench based system
The goal is to eventually consolidate to only the tau2-bench backend once migration is complete.
Current Implementation: Patient intake and triage with "clinical pearl" (de-identified clinical decisions and reasonings) extraction
Key Features:
- Voice-driven patient intake with symptom analysis
- 5-category routing system (Emergency, Urgent, Routine, Self-Care, Information)
- ๐จ Emergency: Life-threatening conditions (chest pain, stroke signs, difficulty breathing)
- โก Urgent: Serious but not immediately life-threatening (high fever, severe pain)
- ๐ Routine: Ongoing or non-urgent issues (mild symptoms, follow-ups, preventive care)
- ๐ Self-Care: Minor issues manageable at home (mild cold, minor headache)
- โน๏ธ Information: Questions about medication, prevention, or general health advice
- Key clinical decisions and reasoning detection from expert corrections and conversation transcripts (aka "clinical pearls")
Value Proposition: Captures unwritten clinical wisdom from physician conversations with zero overhead.
- Frontend: Pure HTML/JavaScript voice client with audio streaming
- API Layer: FastAPI routes and WebSocket handlers for communication
- Core Logic: Vector store + LLM judge + data manager for continuous improvement
- Business Logic: Modular conversation processor with domain-specific routing
- tau2-bench Integration: Simulation framework for dual-control agent evaluation with markdown-driven policies and task creation
- STT: OpenAI Whisper v1 (
whisper-1) - Speech to text conversion - TTS: Cartesia Sonic-2 (
sonic-2) - Natural voice synthesis
- Main Agent: GPT-3.5 Turbo 1106 (
gpt-3.5-turbo-1106) - Conversation processing with function calling - LLM Judge: GPT-3.5 Turbo (
gpt-3.5-turbo) - Feedback analysis and knowledge extraction - Embeddings: text-embedding-3-small (
text-embedding-3-small) - 1536D vectors for similarity search
Real-time dashboard tracking: total conversations, vector store size, system accuracy, route distribution, learning progress, and recent activity. Access at http://localhost:8000/dashboard (auto-opens when using run_voice_app.py).
siva/
โโโ pyproject.toml # Package configuration and dependencies
โโโ main.py # Legacy FastAPI server entry point
โโโ main_tau2.py # New tau2-bench based backend server
โโโ run_voice_app.py # Application launcher
โโโ serve_client.py # Voice client server
โโโ config/ # Configuration management
โ โโโ settings.py # Pydantic settings with env validation
โโโ frontend/ # Web interfaces
โ โโโ voice_client.html # Legacy voice interface
โ โโโ dashboard.html # Legacy performance monitoring
โ โโโ nextjs/ # Modern Next.js dashboard
โ โโโ app/ # Next.js app router
โ โโโ package.json # Node.js dependencies
โ โโโ README.md # Dashboard documentation
โโโ src/siva/ # Main application code
โ โโโ agent/ # Agent implementations
โ โโโ api_service/ # API services and endpoints
โ โโโ data_model/ # Data models and schemas
โ โโโ domains/ # Domain-specific implementations
โ โโโ environment/ # Environment and simulation logic
โ โโโ evaluator/ # Evaluation and metrics
โ โโโ orchestrator/ # Orchestration and workflow
โ โโโ utils/ # Utility functions
โโโ tests/ # Test suite
โโโ data/simulations/ # tau2-bench simulation data
โโโ assets/ # Media files
โ โโโ siva_demo_10x.gif # Demo recording
โ โโโ flowchart_self_learning_agent.jpeg # Architecture overview
โโโ siva_data/ # Learning database + knowledge pearls
SIVA transforms voice interactions into continuously improving AI systems, capturing domain expertise and building collective intelligence across any field.

