A sophisticated voice-based AI assistant using AWS Strands for multi-agent collaboration to interact with AWS services. The system features real-time voice interaction through Amazon Nova Sonic and intelligent routing between specialized AWS agents.
- Voice Interface: Real-time voice input/output using Amazon Nova Sonic
- Multi-Agent Architecture: Supervisor agent coordinates between specialized agents
- AWS Service Integration: Comprehensive support for EC2, SSM, and AWS Backup operations
- Intelligent Routing: Automatic query routing to appropriate specialized agents
- Professional UI: AWS Cloudscape Design components with chat bubbles and event display
We would like to thank the following contributors for their valuable input and work on this project (sorted alphabetically):
β’ Aditya Ambati
β’ Anand Krishna Varanasi
β’ JAGDISH KOMAKULA
β’ Dadi T.V.R.L.Phani Kumar
The system implements a simplified multi-agent architecture:
- Supervisor Agent: Routes queries to specialized AWS agents
- Specialized Agents:
- EC2 Agent: Instance management, status checks, and operations
- SSM Agent: Systems Manager operations, command execution, patch management
- Backup Agent: AWS Backup configuration, job monitoring, and management
- Voice Integration: Amazon Nova Sonic for speech-to-text and text-to-speech
- WebSocket Server: Real-time communication between frontend and backend
- Backend: Python 3.12+ with AWS Strands framework
- Frontend: React with AWS Cloudscape Design components
- AI Models: AWS Bedrock Claude 3 Haiku for all agents
- Voice Processing: Amazon Nova Sonic for audio I/O
- Package Management: Standard pip with requirements.txt
aws-strands-nova-voice-assistant/
βββ backend/ # Backend Python application
β βββ src/
β β βββ voice_based_aws_agent/
β β βββ agents/ # Multi-agent system
β β β βββ orchestrator.py # Central agent coordinator
β β β βββ supervisor_agent.py # Query routing agent
β β β βββ ec2_agent.py # EC2 operations specialist
β β β βββ ssm_agent.py # SSM operations specialist
β β β βββ backup_agent.py # Backup operations specialist
β β βββ config/ # Configuration management
β β βββ utils/
β β β βββ aws_auth.py # AWS authentication
β β β βββ voice_integration/ # Nova Sonic integration
β β β βββ server.py # WebSocket server
β β β βββ s2s_session_manager.py # Stream management
β β β βββ supervisor_agent_integration.py # Agent bridge
β β βββ main.py # Application entry point
β βββ tools/ # Strands tools
β βββ supervisor_tool.py # Supervisor agent tool integration
βββ frontend/ # React web interface
β βββ src/
β β βββ components/ # React components
β β βββ helper/ # Audio processing utilities
β β βββ App.js # Main React application
β β βββ VoiceAgent.js # Voice interface component
β βββ package.json # Node.js dependencies
βββ requirements.txt # Python dependencies
βββ run_backend.sh # Backend startup script
βββ run_frontend.sh # Frontend startup script
βββ README.md # This file
Running this voice-driven AWS assistant involves several AWS services with usage-based pricing. Here's a comprehensive breakdown for planning and budgeting:
- Speech input tokens: $0.0034 per 1K tokens
- Speech output tokens: $0.0136 per 1K tokens
- Text input tokens: $0.00006 per 1K tokens
- Text output tokens: $0.00024 per 1K tokens
- Input tokens: $0.00025 per 1K tokens
- Output tokens: $0.00025 per 1K tokens
Amazon EC2 (for testing/demo instances)
- t3.micro Linux: $0.0104 per hour (~$7.49/month if running 24/7)
- Most users run instances only during testing
AWS Systems Manager
- Basic operations: Free tier available
- Advanced parameters: $0.05 per parameter per month
- Data transfer: $0.90 per GB (rarely needed for typical usage)
AWS Backup
- Storage: ~$0.05 per GB per month (standard warm storage)
- Cross-region data transfer: $0.02 per GB
- Restore operations: $0.02 per GB
Light Development Usage (2-3 hours/day, 5 days/week)
- Voice interactions: ~50 conversations/week
- Average conversation: 4-6 exchanges
- Estimated monthly cost: $8-15
Moderate Testing (Daily usage, multiple team members)
- Voice interactions: ~200 conversations/week
- Extended conversations: 8-10 exchanges each
- Estimated monthly cost: $25-45
Heavy Development (Continuous testing, multiple environments)
- Voice interactions: ~500+ conversations/week
- Complex operations: Long-running AWS tasks
- Estimated monthly cost: $60-100+
The solution includes several built-in cost controls:
- Voice response truncation: Limited to 800 characters to reduce token usage
- Efficient agent routing: Minimizes unnecessary LLM calls
- Conversation management: Sliding window to control context size
- Free tier utilization: Leverages AWS free tier services where available
- Conversation Length: Longer conversations consume more tokens
- Voice vs Text: Voice processing costs more than text-only interactions
- AWS Operations: Complex operations requiring multiple API calls
- Instance Usage: EC2 instances for testing (can be stopped when not needed)
- Backup Storage: Depends on data volume being backed up
For production deployments, additional costs may include:
- Load balancing: Application Load Balancer (~$16/month)
- High availability: Multi-AZ deployments
- Monitoring: CloudWatch logs and metrics
- Security: WAF, additional IAM roles
- Scaling: Auto Scaling groups and larger instance types
Important Notes:
- Pricing shown is for US East (N. Virginia) region as of August 2025
- Actual costs depend heavily on usage patterns and conversation frequency
- The voice-optimized design helps keep token usage predictable
- Consider using AWS Cost Explorer and budgets for monitoring
- Python 3.12+ with pip
- Node.js 16+ and npm
- AWS Account with access to:
- AWS Bedrock (Claude 3 Haiku model)
- Amazon Nova Sonic
- EC2, SSM, and AWS Backup services
- AWS CLI configured with appropriate credentials
- Audio hardware (microphone and speakers for voice mode)
Clone and prepare the environment:
# Clone the repository
git clone <repository-url>
cd sample-aws-strands-nova-voice-assistant
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Unix/macOS
# or
.venv\Scripts\activate # On Windows
# Install Python dependencies
pip install -r requirements.txtThis application uses two different AWS authentication mechanisms:
Nova Sonic Integration: Requires AWS credentials as environment variables Other AWS Services: Uses boto3 with AWS profiles
Set up AWS credentials for Nova Sonic (required as environment variables):
export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
export AWS_SESSION_TOKEN=<your-session-token> # Only if using temporary credentials
export AWS_DEFAULT_REGION=<your-region>Configure AWS CLI profile for other services:
aws configure --profile <your-profile-name>
# Enter your AWS Access Key ID, Secret Access Key, and default regionApply the required IAM permissions to your AWS user/role:
- Amazon Bedrock model invocation
- EC2 instance management
- SSM operations
- AWS Backup functionality
- Supporting services (KMS, STS)
Test your configuration:
# Test AWS CLI access
aws sts get-caller-identity --profile <your-profile-name>
# Test environment variables
echo $AWS_ACCESS_KEY_IDSecurity Note: Follow the principle of least privilege - grant only permissions needed for your specific use case. Customize the provided IAM policy based on which AWS services you plan to use with the voice assistant.
AWS Permissions Example:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockPermissions",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:GetFoundationModel",
"bedrock:ListFoundationModels"
],
"Resource": "*"
},
{
"Sid": "EC2ReadPermissions",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:DescribeImages",
"ec2:DescribeVpcs",
"ec2:DescribeInstanceStatus"
],
"Resource": "*"
},
{
"Sid": "EC2WritePermissions",
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:RebootInstances"
],
"Resource": "*"
},
{
"Sid": "SSMReadPermissions",
"Effect": "Allow",
"Action": [
"ssm:DescribeInstanceInformation",
"ssm:GetCommandInvocation",
"ssm:ListCommands",
"ssm:ListCommandInvocations",
"ssm:DescribeDocument",
"ssm:ListDocuments"
],
"Resource": "*"
},
{
"Sid": "SSMWritePermissions",
"Effect": "Allow",
"Action": [
"ssm:SendCommand",
"ssm:StartSession",
"ssm:CreateDocument"
],
"Resource": "*"
},
{
"Sid": "BackupReadPermissions",
"Effect": "Allow",
"Action": [
"backup:ListBackupJobs",
"backup:DescribeBackupVault",
"backup:ListBackupPlans",
"backup:ListBackupVaults",
"backup:DescribeBackupJob"
],
"Resource": "*"
},
{
"Sid": "BackupWritePermissions",
"Effect": "Allow",
"Action": [
"backup:CreateBackupVault",
"backup:CreateBackupPlan",
"backup:StartBackupJob",
"backup:StartRestoreJob"
],
"Resource": "*"
},
{
"Sid": "BackupStoragePermissions",
"Effect": "Allow",
"Action": [
"backup-storage:StartObject",
"backup-storage:PutObject",
"backup-storage:ListObjects"
],
"Resource": "*"
},
{
"Sid": "SupportingPermissions",
"Effect": "Allow",
"Action": [
"kms:DescribeKey",
"sts:GetCallerIdentity"
],
"Resource": "*"
}
]
}# Navigate to frontend directory
cd frontend
# Install Node.js dependencies
npm installStart the backend server:
# From the project root (recommended)
./run_backend.sh
# Or with custom parameters:
./run_backend.sh --profile <your-profile> --region <your-region> --voice matthewStart the frontend in a new terminal:
# Development mode (recommended)
./run_frontend.sh
# Or manually:
cd frontend
npm startThe system supports multiple voice options for text-to-speech:
# Use different voice options
./run_backend.sh --voice matthew # Default male voice
./run_backend.sh --voice tiffany # Female voice option
./run_backend.sh --voice amy # Alternative female voice- Start the backend server as described above
- Start the React frontend in development mode
- Open your browser to http://localhost:3000
- Configure WebSocket URL if needed (default: ws://localhost:8080)
- Click "Start Conversation" to begin voice interaction
- Grant microphone permissions when prompted
-
Default port changed: The default backend port is now 8080 (changed from 80 to avoid requiring administrator privileges). If you need to use a different port:
# Use a custom port ./run_backend.sh --port 3001Then update the WebSocket URL in the frontend to match your chosen port
-
AWS Profile: Make sure your AWS profile has the necessary permissions for Bedrock and other AWS services
- List and describe EC2 instances
- Start, stop, and reboot instances
- Instance status monitoring
- Security group and VPC information
- Execute commands on instances
- Patch management operations
- Parameter Store interactions
- Session Manager connections
- List backup jobs and vaults
- Configure backup plans
- Monitor backup status
- Restore operations
To avoid ongoing charges, clean up your resources when you're done testing:
- Terminate any test instances created during the demo
- Remove the custom IAM roles and policies created for this solution
- Remove any backup plans or vaults created while testing
- Delete any snapshots or AMIβs created during the tests
If you encounter issues during setup or operation, these common solutions can help resolve most problems:
- Ensure microphone permissions are granted in the browser
- Try using Firefox for better Web Audio API compatibility
- Verify system audio settings are configured correctly
- Verify the backend server is running on the correct port
- Check firewall settings for WebSocket traffic
- Ensure the frontend WebSocket URL matches the backend configuration
- Verify IAM policies include necessary service permissions
- Check AWS CLI configuration and credentials
- Ensure the AWS profile has access to required regions
- Simple session management without complex recovery mechanisms
- Basic WebSocket server without automatic reconnection loops
- Straightforward tool processing with the supervisor agent
- Clean error handling that asks users to restart rather than automatic recovery
- Single Tool: Uses one
supervisorAgenttool that routes to specialized agents - Voice Optimization: Responses truncated to 800 characters for better voice experience
- User-Controlled Recovery: When errors occur, users manually restart conversations
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Check the troubleshooting section above
- Review AWS service documentation
- Ensure all prerequisites are met
- Verify AWS permissions and credentials
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.