Goal: Systematically test the project to find real bugs worth fixing
- Send a simple message to the agent
- Test with long messages (>1000 chars)
- Test special characters and emojis
- Test code blocks in messages
- Verify response time is reasonable
- Check if conversation history persists
How to test:
# Terminal 1
node milady.mjs start
# Terminal 2
cd apps/ui && bun run dev
# Browser: http://localhost:18789
# Send: "Hello, what's 2+2?"
# Send: "Write a Python function to reverse a string"
# Send: Long message with emojis 🚀✨Expected: Agent responds coherently using Claude Haiku Document: Any errors, slow responses, or broken features
- Check if "Status" button shows agent state
- Verify uptime is displayed
- Check if model name shows correctly (Claude 3.5 Haiku)
- Test Pause/Resume buttons
- Test Stop/Restart buttons
How to test: Click all buttons in the UI and observe behavior
- Navigate to "Plugins" tab
- Verify Anthropic plugin shows as loaded
- Check if other plugins are listed
- Try installing a new plugin (if possible)
Look for: Missing plugins, load errors, UI glitches
- Open "Config" tab
- Verify current settings display
- Try changing a setting
- Check if changes persist after restart
- Navigate to "Skills" tab
- Check if any skills are loaded
- Test refreshing skills list
- Open "Logs" tab
- Verify logs are streaming
- Check if log levels are correct
- Test filtering/searching logs
-
GET /v1/modelsreturns a list of models -
POST /v1/chat/completionsreturns an OpenAI-shaped response -
POST /v1/chat/completionssupportsstream: true(SSE) -
POST /v1/messagesreturns an Anthropic-shaped response -
POST /v1/messagessupportsstream: true(SSE)
How to test:
# Terminal 1
bun run start
# If you have MILADY_API_TOKEN set, add this to curl:
# -H "Authorization: Bearer $MILADY_API_TOKEN"
curl -sS http://localhost:2138/v1/models | jq .
curl -sS http://localhost:2138/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "milady",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Say hello in one sentence." }
]
}' | jq .
curl -N http://localhost:2138/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "milady",
"stream": true,
"messages": [
{ "role": "user", "content": "Stream a short haiku." }
]
}'
curl -sS http://localhost:2138/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "milady",
"max_tokens": 256,
"system": "You are a helpful assistant.",
"messages": [
{ "role": "user", "content": "What is 2+2?" }
]
}' | jq .
curl -N http://localhost:2138/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "milady",
"stream": true,
"max_tokens": 256,
"messages": [
{ "role": "user", "content": "Stream the answer to 2+2." }
]
}'Expected:
- OpenAI route returns
{ id, object, created, model, choices: [...] } - Anthropic route returns
{ id, type: \"message\", role: \"assistant\", content: [{type:\"text\", text:\"...\"}] } - Streaming routes produce
data:SSE chunks and complete without hanging
cd a:\programa\ai\milady
# Build everything
bun run build
# Build desktop app
cd apps/app
bun install
bun run build
npx cap sync @capacitor-community/electron
# Build Electron
cd electron
npm install
bun run build
# Run in dev mode
bun run electron:start-live- Verify app window opens
- Check system tray integration
- Test minimize to tray
- Test native notifications
- Check if shortcuts work
- Verify auto-updater (if configured)
Look for: Crashes, missing features, UI bugs
- Test on fresh Windows install (if possible)
- Check file path handling (Windows uses backslashes)
- Verify .env file is read correctly
- Test with spaces in installation path
- Check permissions issues
- Test first-run experience (delete ~/.milady and restart)
- Verify onboarding wizard works
- Test API key input
- Test wallet generation
- Test with invalid API key
- Test with missing .env file
- Test with corrupt config file
- Test with unsupported model name
- Test with no internet (should still work locally)
- Test with slow connection
- Test API rate limiting
- Test with low memory
- Test with multiple instances running
- Test long-running sessions (hours)
- Navigate to "Inventory" tab
- Test wallet address display
- Test balance checking (if API keys configured)
- Test wallet export/import
- Check if browser plugin is available
- Test screenshot functionality
- Test webpage scraping
- Test creating multiple agents
- Test switching between agents
- Test agent-specific configs
cd a:\programa\ai\milady
npx tsc --noEmitDocument: Any type errors that should be fixed
# If eslint is configured
bun run lint# Check for outdated packages
bun outdated
# Check for security vulnerabilities
bun audit- Verify all commands in README work
- Check for outdated screenshots
- Test installation instructions
- Verify links aren't broken
- Check if environment variables are documented
- Verify plugin system is explained
- Look for undocumented features
When you find a bug, document it in bugs.md with:
### Bug #XXX
**Bug ID**: XXX
**Severity**: Critical / High / Medium / Low
**Component**: (chat / ui / config / plugins / desktop / etc)
**Platform**: Windows 11 / macOS / Linux
**Description**: Clear description of what's broken
**Steps to Reproduce**:
1. Step 1
2. Step 2
3. Step 3
**Expected**: What should happen
**Actual**: What actually happened
**Error Output**: Full error message
**Screenshots**: (if UI issue)
**Status**: Open / In Progress / FixedFix First (High Impact):
- Crashes or data loss
- Security issues
- Features that don't work at all
- Poor UX that blocks users
Fix Later (Nice to Have):
- Minor UI glitches
- Performance optimizations
- Code quality improvements
- Documentation updates
Easy Wins (Good first PRs):
- Fix typos in docs
- Update outdated dependencies
- Fix broken links
- Add missing error messages
- Improve log messages
Medium Complexity:
- Fix WebSocket stub (implement or remove cleanly)
- Improve Windows path handling
- Add better error handling
- Fix TypeScript strict mode issues
High Impact:
- Fix desktop app build issues
- Improve first-run experience
- Add missing tests
- Performance improvements
- Start with Phase 1 (Web Dashboard testing)
- Document everything you find
- Ask the person who hired you: What features are priority?
- Focus on Windows since that's your platform
- Create small, focused PRs (easier to review)
Good luck! 🚀