SpeakEasy Studio

SpeakEasy Studio is a Windows-first desktop application for text-to-speech and audio-to-text workflows. It combines a modern CustomTkinter interface with multiple synthesis engines, optional summarization, transcript processing, and persistent history/configuration.

Application Screenshots

Text To Speech Tab

Audio To Text Tab

Settings Tab

What This Project Does

Converts text into speech audio with selectable voices and output formats.
Transcribes audio files to text using Whisper.
Applies optional readability enhancements before synthesis.
Supports summary generation before conversion.
Tracks output history and provides built-in playback controls.

Core Features

Text To Speech

Input sources: paste text or load TXT, MD, PDF, DOCX files.
PDF page-range support.
Optional summarization in the input tab:
- sumy for lightweight extraction.
- bart for higher-quality abstractive summaries.
Readability enhancements with inline controls:
- Pause enhancement level (off, mild, strong)
- Newline normalization
- Heuristic punctuation insertion
- List pause enhancement
- Paragraph pause enhancement
- Edge fallback pause behavior
Playback highlighting in the editor during audio playback.
Dedicated Arabic Editor button (RTL mode) that opens a Qt-based editor for robust Arabic typing, wrapping, and selection.

Speech To Text

Audio transcription via Whisper (tiny, base, small).
Optional transcript cleaning and technical normalization.
Optional emotion/sentiment analysis pipeline and synthesis hints.
Output as plain text or markdown.
Send transcript directly to Text to Speech tab.

TTS Engines

Edge TTS (online neural voices).
Piper TTS (offline local models).

Voice And Playback

Voice browser and refresh.
Piper catalog search for undiscovered voices from multiple sources.
Source toggles for Piper catalog providers (Hugging Face and project catalog), persisted across app restarts.
Undownloaded Piper voices are marked visually in the voice list.
Selecting an undownloaded Piper voice prompts for download.
Download progress shows percentage, downloaded/total size, transfer speed, and ETA.
Favorites and per-engine last voice memory.
Rate, pitch, and volume control.
Output formats: MP3 and WAV.
Built-in player bar:
- Play/pause, stop, seek, speed control, live time updates.

History And Persistence

Conversion history in src/output/history.json.
Config persistence in src/config.json.
Theme and processing preferences saved across sessions.

Tech Stack

Language: Python 3.x
UI: CustomTkinter + ttk
TTS: edge-tts, piper-tts
STT: openai-whisper
Summarization: sumy, transformers, torch
Document parsing: pdfplumber, python-docx, markdown-it-py
External RTL editor: PySide6
Media processing/playback: ffmpeg/ffplay (winsound fallback for limited playback)

Project Structure

.
|-- justfile
|-- plan.md
|-- src/
|   |-- main.py
|   |-- config.json
|   |-- requirements.txt
|   |-- core/
|   |-- ui/
|   |-- models/
|   `-- output/
|-- docs/
|   |-- FEATURES.md
|   |-- USAGE.md
|   |-- JUSTFILE.md
|   `-- ARCHITECTURE.md
`-- README.md

Getting Started

Prerequisites

Windows PowerShell
Python 3.10+
ffmpeg available in PATH
Optional: just command runner
PySide6 (installed via requirements) for the external Arabic editor window

Quick Start With just

just venv
just install
just run
just run mode="new"

Quick Start Without just

python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r .\src\requirements.txt
.\.venv\Scripts\python.exe .\src\main.py

justfile Overview

The project ships with an automation file for local development, verification, and packaging.

Key commands:

just help
just install
just run (legacy default)
just run mode="new" (PySide6 migration UI)
just run-old
just run-new
just compile
just smoke
just smoke-old
just smoke-new
just stability-startup-cycles cycles="3"
just stability-long-sessions
just stability-cancel-recovery
just phase5-automated
just verify-tts
just piper-list
just piper-download-default
just build
just clean

Full reference: see docs/JUSTFILE.md.

Keyboard Shortcuts

Ctrl+O: open file
Ctrl+Enter: start conversion
Ctrl+S: start conversion
Ctrl+Shift+V: focus text input tab/editor
Space: play/pause (outside text-input widgets)
Editor-focused shortcuts (IME/language-independent path):
- Ctrl+C, Ctrl+V, Ctrl+X, Ctrl+A
- Ctrl+Z, Ctrl+Y, Ctrl+Shift+Z

Documentation Index

Feature reference: docs/FEATURES.md
Usage guide: docs/USAGE.md
Architecture and flow: docs/ARCHITECTURE.md
just commands reference: docs/JUSTFILE.md

Notes

First run for some models (Whisper/BART/Piper voices) may require downloads.
Long-running operations execute in background threads with cancellation support.
This project is currently optimized for Windows workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
formattedText.txt		formattedText.txt
justfile		justfile
plan.md		plan.md
textInput.txt		textInput.txt
transcribedText.txt		transcribedText.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakEasy Studio

Application Screenshots

Text To Speech Tab

Audio To Text Tab

Settings Tab

What This Project Does

Core Features

Text To Speech

Speech To Text

TTS Engines

Voice And Playback

History And Persistence

Tech Stack

Project Structure

Getting Started

Prerequisites

Quick Start With just

Quick Start Without just

justfile Overview

Keyboard Shortcuts

Documentation Index

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeakEasy Studio

Application Screenshots

Text To Speech Tab

Audio To Text Tab

Settings Tab

What This Project Does

Core Features

Text To Speech

Speech To Text

TTS Engines

Voice And Playback

History And Persistence

Tech Stack

Project Structure

Getting Started

Prerequisites

Quick Start With just

Quick Start Without just

justfile Overview

Keyboard Shortcuts

Documentation Index

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages