Skip to content

Ryen-042/Text-to-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeakEasy Studio

SpeakEasy Studio is a Windows-first desktop application for text-to-speech and audio-to-text workflows. It combines a modern CustomTkinter interface with multiple synthesis engines, optional summarization, transcript processing, and persistent history/configuration.

Application Screenshots

Text To Speech Tab

Text To Speech Tab

Audio To Text Tab

Audio To Text Tab

Settings Tab

Settings Tab

What This Project Does

  • Converts text into speech audio with selectable voices and output formats.
  • Transcribes audio files to text using Whisper.
  • Applies optional readability enhancements before synthesis.
  • Supports summary generation before conversion.
  • Tracks output history and provides built-in playback controls.

Core Features

Text To Speech

  • Input sources: paste text or load TXT, MD, PDF, DOCX files.
  • PDF page-range support.
  • Optional summarization in the input tab:
    • sumy for lightweight extraction.
    • bart for higher-quality abstractive summaries.
  • Readability enhancements with inline controls:
    • Pause enhancement level (off, mild, strong)
    • Newline normalization
    • Heuristic punctuation insertion
    • List pause enhancement
    • Paragraph pause enhancement
    • Edge fallback pause behavior
  • Playback highlighting in the editor during audio playback.
  • Dedicated Arabic Editor button (RTL mode) that opens a Qt-based editor for robust Arabic typing, wrapping, and selection.

Speech To Text

  • Audio transcription via Whisper (tiny, base, small).
  • Optional transcript cleaning and technical normalization.
  • Optional emotion/sentiment analysis pipeline and synthesis hints.
  • Output as plain text or markdown.
  • Send transcript directly to Text to Speech tab.

TTS Engines

  • Edge TTS (online neural voices).
  • Piper TTS (offline local models).

Voice And Playback

  • Voice browser and refresh.
  • Piper catalog search for undiscovered voices from multiple sources.
  • Source toggles for Piper catalog providers (Hugging Face and project catalog), persisted across app restarts.
  • Undownloaded Piper voices are marked visually in the voice list.
  • Selecting an undownloaded Piper voice prompts for download.
  • Download progress shows percentage, downloaded/total size, transfer speed, and ETA.
  • Favorites and per-engine last voice memory.
  • Rate, pitch, and volume control.
  • Output formats: MP3 and WAV.
  • Built-in player bar:
    • Play/pause, stop, seek, speed control, live time updates.

History And Persistence

  • Conversion history in src/output/history.json.
  • Config persistence in src/config.json.
  • Theme and processing preferences saved across sessions.

Tech Stack

  • Language: Python 3.x
  • UI: CustomTkinter + ttk
  • TTS: edge-tts, piper-tts
  • STT: openai-whisper
  • Summarization: sumy, transformers, torch
  • Document parsing: pdfplumber, python-docx, markdown-it-py
  • External RTL editor: PySide6
  • Media processing/playback: ffmpeg/ffplay (winsound fallback for limited playback)

Project Structure

.
|-- justfile
|-- plan.md
|-- src/
|   |-- main.py
|   |-- config.json
|   |-- requirements.txt
|   |-- core/
|   |-- ui/
|   |-- models/
|   `-- output/
|-- docs/
|   |-- FEATURES.md
|   |-- USAGE.md
|   |-- JUSTFILE.md
|   `-- ARCHITECTURE.md
`-- README.md

Getting Started

Prerequisites

  • Windows PowerShell
  • Python 3.10+
  • ffmpeg available in PATH
  • Optional: just command runner
  • PySide6 (installed via requirements) for the external Arabic editor window

Quick Start With just

just venv
just install
just run
just run mode="new"

Quick Start Without just

python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r .\src\requirements.txt
.\.venv\Scripts\python.exe .\src\main.py

justfile Overview

The project ships with an automation file for local development, verification, and packaging.

Key commands:

  • just help
  • just install
  • just run (legacy default)
  • just run mode="new" (PySide6 migration UI)
  • just run-old
  • just run-new
  • just compile
  • just smoke
  • just smoke-old
  • just smoke-new
  • just stability-startup-cycles cycles="3"
  • just stability-long-sessions
  • just stability-cancel-recovery
  • just phase5-automated
  • just verify-tts
  • just piper-list
  • just piper-download-default
  • just build
  • just clean

Full reference: see docs/JUSTFILE.md.

Keyboard Shortcuts

  • Ctrl+O: open file
  • Ctrl+Enter: start conversion
  • Ctrl+S: start conversion
  • Ctrl+Shift+V: focus text input tab/editor
  • Space: play/pause (outside text-input widgets)
  • Editor-focused shortcuts (IME/language-independent path):
    • Ctrl+C, Ctrl+V, Ctrl+X, Ctrl+A
    • Ctrl+Z, Ctrl+Y, Ctrl+Shift+Z

Documentation Index

  • Feature reference: docs/FEATURES.md
  • Usage guide: docs/USAGE.md
  • Architecture and flow: docs/ARCHITECTURE.md
  • just commands reference: docs/JUSTFILE.md

Notes

  • First run for some models (Whisper/BART/Piper voices) may require downloads.
  • Long-running operations execute in background threads with cancellation support.
  • This project is currently optimized for Windows workflows.

Releases

No releases published

Packages

 
 
 

Contributors