Real-time AI commentary for coding sessions, delivered sports-broadcast style by two AI commentators.
- Dual Commentators: Alex (play-by-play) and Morgan (color commentary) with distinct personalities
- Real-time Analysis: Uses Gemini Vision to understand what's happening on screen
- Natural Dialogue: Powered by Gemini TTS for realistic two-speaker audio (free!)
- Floating UI Controller: Pause, resume, or stop commentary with a simple floating window
- Smart Change Detection: Only comments when the screen actually changes
- Hotkey Toggle: Pause/resume commentary with Ctrl+Shift+P
# Clone the repo
git clone https://github.com/thefirebanks/esopn.git
cd esopn
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .# Get a free API key from https://aistudio.google.com/apikey
export ESOPN_GEMINI_API_KEY=your_key_here# Start commentary with floating UI controller
uv run esopn watch --uiThis launches a floating window with Pause, Resume, and Stop buttons. Switch to your code editor or terminal and watch the commentary roll in!
# Run without UI (use Ctrl+Shift+P to pause, Ctrl+C to stop)
uv run esopn watch| Action | Method |
|---|---|
| Pause/Resume | Click button in UI, or press Ctrl+Shift+P |
| Stop | Click "Stop & Exit" in UI, or press Ctrl+C |
esopn watch # Start commentary (recommended)
esopn watch --ui # Start with floating UI controller
esopn run # Start commentary (full options)
esopn test-capture # Test screenshot capture
esopn test-tts # Test TTS synthesis
esopn test-vision # Test vision analysis
esopn info # Check system/dependenciesesopn watch [OPTIONS]
Options:
--ui Show floating UI controller window
-i, --interval FLOAT Seconds between screenshots
-d, --device TEXT TTS device (cuda, mps, cpu, auto)
-M, --mode TEXT Commentary mode (sports, wwe, freeman_mj)
-v, --verbose Enable verbose logging┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Screenshot │ ──▶ │ Gemini │ ──▶ │ Commentary │ ──▶ │ Gemini TTS │
│ (mss) │ │ Vision │ │ Generator │ │ (free!) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │
│ ┌─────────────┐ │
└────────────────────▶│ Speakers │◀──────────────────────┘
└─────────────┘
- Captures screenshots and detects when the screen changes (>5% difference)
- Gemini Vision analyzes what's happening (code, terminal, action)
- Commentary LLM generates sports-style dialogue between Alex & Morgan
- Gemini TTS synthesizes natural two-speaker audio with distinct voices
- Audio plays through your speakers in real-time
- Python: 3.10+
- API Key: Google Gemini (free tier works great!)
- macOS/Linux: For screen capture
For screen capture, grant permissions:
- System Preferences → Privacy & Security → Screen Recording
- Add your terminal app (Terminal, iTerm2, etc.)
For active window capture, also grant:
- System Preferences → Privacy & Security → Accessibility
- Add your terminal app
- High-energy, describes what's happening
- Calls out specific actions, file names, patterns
- Uses conversational descriptions (not literal code reading)
Example: "New submit handler going in! Looks like they're setting up form validation!"
- Analytical with energy and enthusiasm
- Explains WHY the code matters
- Provides technical insight
Example: "That's the Strategy pattern right there - makes it easy to swap algorithms later!"
Environment variables:
| Variable | Default | Description |
|---|---|---|
ESOPN_GEMINI_API_KEY |
- | Google Gemini API key (required) |
ESOPN_CAPTURE_INTERVAL |
3.0 | Seconds between screenshots |
ESOPN_TTS_PROVIDER |
gemini | TTS provider (gemini or elevenlabs) |
Or create a .env file:
ESOPN_GEMINI_API_KEY=your_key_here
ESOPN_CAPTURE_INTERVAL=3.0- Screenshots are sent to Gemini for analysis. Avoid capturing windows with secrets, credentials, or private data.
- Do not pass API keys on CLI flags; prefer
.envor exported environment variables.
MIT