SpeakEasy Studio is a Windows-first desktop application for text-to-speech and audio-to-text workflows. It combines a modern CustomTkinter interface with multiple synthesis engines, optional summarization, transcript processing, and persistent history/configuration.
- Converts text into speech audio with selectable voices and output formats.
- Transcribes audio files to text using Whisper.
- Applies optional readability enhancements before synthesis.
- Supports summary generation before conversion.
- Tracks output history and provides built-in playback controls.
- Input sources: paste text or load TXT, MD, PDF, DOCX files.
- PDF page-range support.
- Optional summarization in the input tab:
sumyfor lightweight extraction.bartfor higher-quality abstractive summaries.
- Readability enhancements with inline controls:
- Pause enhancement level (
off,mild,strong) - Newline normalization
- Heuristic punctuation insertion
- List pause enhancement
- Paragraph pause enhancement
- Edge fallback pause behavior
- Pause enhancement level (
- Playback highlighting in the editor during audio playback.
- Dedicated Arabic Editor button (RTL mode) that opens a Qt-based editor for robust Arabic typing, wrapping, and selection.
- Audio transcription via Whisper (
tiny,base,small). - Optional transcript cleaning and technical normalization.
- Optional emotion/sentiment analysis pipeline and synthesis hints.
- Output as plain text or markdown.
- Send transcript directly to Text to Speech tab.
- Edge TTS (online neural voices).
- Piper TTS (offline local models).
- Voice browser and refresh.
- Piper catalog search for undiscovered voices from multiple sources.
- Source toggles for Piper catalog providers (Hugging Face and project catalog), persisted across app restarts.
- Undownloaded Piper voices are marked visually in the voice list.
- Selecting an undownloaded Piper voice prompts for download.
- Download progress shows percentage, downloaded/total size, transfer speed, and ETA.
- Favorites and per-engine last voice memory.
- Rate, pitch, and volume control.
- Output formats: MP3 and WAV.
- Built-in player bar:
- Play/pause, stop, seek, speed control, live time updates.
- Conversion history in
src/output/history.json. - Config persistence in
src/config.json. - Theme and processing preferences saved across sessions.
- Language: Python 3.x
- UI: CustomTkinter + ttk
- TTS:
edge-tts,piper-tts - STT:
openai-whisper - Summarization:
sumy,transformers,torch - Document parsing:
pdfplumber,python-docx,markdown-it-py - External RTL editor:
PySide6 - Media processing/playback: ffmpeg/ffplay (winsound fallback for limited playback)
.
|-- justfile
|-- plan.md
|-- src/
| |-- main.py
| |-- config.json
| |-- requirements.txt
| |-- core/
| |-- ui/
| |-- models/
| `-- output/
|-- docs/
| |-- FEATURES.md
| |-- USAGE.md
| |-- JUSTFILE.md
| `-- ARCHITECTURE.md
`-- README.md
- Windows PowerShell
- Python 3.10+
- ffmpeg available in PATH
- Optional:
justcommand runner - PySide6 (installed via requirements) for the external Arabic editor window
just venv
just install
just run
just run mode="new"python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r .\src\requirements.txt
.\.venv\Scripts\python.exe .\src\main.pyThe project ships with an automation file for local development, verification, and packaging.
Key commands:
just helpjust installjust run(legacy default)just run mode="new"(PySide6 migration UI)just run-oldjust run-newjust compilejust smokejust smoke-oldjust smoke-newjust stability-startup-cycles cycles="3"just stability-long-sessionsjust stability-cancel-recoveryjust phase5-automatedjust verify-ttsjust piper-listjust piper-download-defaultjust buildjust clean
Full reference: see docs/JUSTFILE.md.
Ctrl+O: open fileCtrl+Enter: start conversionCtrl+S: start conversionCtrl+Shift+V: focus text input tab/editorSpace: play/pause (outside text-input widgets)- Editor-focused shortcuts (IME/language-independent path):
Ctrl+C,Ctrl+V,Ctrl+X,Ctrl+ACtrl+Z,Ctrl+Y,Ctrl+Shift+Z
- Feature reference:
docs/FEATURES.md - Usage guide:
docs/USAGE.md - Architecture and flow:
docs/ARCHITECTURE.md - just commands reference:
docs/JUSTFILE.md
- First run for some models (Whisper/BART/Piper voices) may require downloads.
- Long-running operations execute in background threads with cancellation support.
- This project is currently optimized for Windows workflows.


