Hum It Out is a voice-to-music conversion system built for the Cascadia JS hackathon. Users call a phone number, hum or sing a melody, and receive studio-ready music files (MIDI, backing tracks, stems) via SMS within 60 seconds.
- Phone-first interface - No app downloads, works from any phone
- PIN-based authentication - Call from anywhere using your unique PIN
- AI-powered music generation - Multi-agent system creates professional backing tracks
- Instant DAW integration - Files ready for GarageBand, Logic Pro, Ableton Live
- Frontend: React + Tailwind CSS (GitHub dark mode aesthetic)
- Backend: Node.js + Express
- Database: Neon PostgreSQL
- Voice Infrastructure: Twilio Voice + SMS
- AI Processing: OpenAI Whisper + GPT-4
- Multi-Agent Orchestration: AG2 (AutoGen)
- Audio Synthesis: Tone.js + Web Audio API
- Twilio - Voice calls, recordings, SMS notifications
- OpenAI - Audio transcription, music analysis, lyric enhancement
- Neon - User data, session storage, file metadata
- AG2/AutoGen - Multi-agent music generation workflow
Phone Call → Twilio → PIN Authentication → Recording →
OpenAI Whisper → AG2 Multi-Agent Processing →
Music Generation → File Creation → SMS Delivery →
Web Dashboard Access
- User registers on web dashboard → Gets unique 6-digit PIN
- User calls 555-HUMS (555-4867) from any phone
- System prompts for PIN → Validates against database
- Authenticated user can record 30-second audio
- User hums/sings melody with or without lyrics
- Twilio captures high-quality audio recording
- OpenAI Whisper transcribes lyrics and analyzes tempo
- AG2 multi-agent system processes musical elements:
- MusicAnalyst: Determines key, mood, genre
- ChordComposer: Creates chord progressions
- GenreSpecialist: Generates style variations
- ArrangementDirector: Finalizes instrumentation
Generated package includes:
- Backing track (.wav) - Full mixed track ready to play
- Individual stems (.wav) - Drums, bass, chords, melody guide
- MIDI file (.mid) - For DAW import and instrument customization
- Lyrics sheet (.txt) - Clean, formatted lyrics
- Session metadata (tempo, key, chord progression)
- SMS notification with download links sent immediately
- Web dashboard provides session history and re-downloads
- Versioning support - "Build on session 1234" for iterations
- Twilio provides high-quality voice recording (.wav format)
- OpenAI Whisper trained on diverse audio quality inputs
- Preprocessing normalization for consistent results
- Async pipeline prevents call timeout
- User gets immediate confirmation, processing continues
- SMS notification when files are ready (typically 45-90 seconds)
- PIN system allows any phone to access user account
- No app installation required
- Web dashboard for session management
- Standardized file formats (.mid, .wav, .txt)
- Tempo-locked stems for instant sync
- Descriptive filenames include BPM and key information
- Authentication via unique 6-digit PIN
- Email for account recovery
- Phone number for SMS notifications
- Links to user account
- Original audio file URL
- Transcribed lyrics and musical analysis
- Processing timestamps
- Links to session
- Version control for iterations
- File URLs for all generated assets
- Generation parameters for reproducibility
- PIN-based authentication (no passwords over phone)
- Rate limiting on authentication attempts
- Temporary file cleanup (30-day retention)
- Audio recordings stored securely in cloud storage
- Multi-modal AI pipeline (voice → text → music)
- Real-time audio processing
- Multi-agent AI orchestration with AG2
- Cross-platform file compatibility
- Voice-to-music feels magical
- Judge participation in live demo
- Professional-quality output
- Unique phone-based UX
- Simplified processing pipeline with fallbacks
- Pre-tested audio samples
- Backup generation methods
- Local development environment
- Clear problem/solution narrative
- Live judge interaction
- Before/after audio comparison
- Real DAW integration demonstration
- Phase 1 (2 hours): Database setup, basic Twilio webhooks
- Phase 2 (2 hours): OpenAI integration, audio processing
- Phase 3 (2 hours): AG2 multi-agent workflow
- Phase 4 (2 hours): Frontend dashboard, file generation
- Phase 5 (1 hour): Integration testing, demo preparation
- Real-time collaboration (multiple callers → same session)
- Advanced mixing interface in web dashboard
- Integration with Spotify, SoundCloud APIs
- Voice recognition for user authentication
- Mobile app companion
This project demonstrates:
- Product thinking - Solving real creator problems
- Technical architecture - Scalable, modular design
- User experience - Simple but powerful workflow
- Integration skills - Multiple APIs working together
- Presentation skills - Demo-ready in constrained timeframe
The user (developer) is experienced with sampling and music production workflows, understands DAW integration needs, and values both technical sophistication and user experience simplicity.