SpeechFlow

SpeechFlow is a modular and extensible application for real-time audio transcription and intelligent conversational responses. It leverages transcription services and conversational AI models provided by the thinkhub library to provide a seamless experience for speech processing and natural language understanding.

Features

Real-Time Audio Capture: Record and process audio in real-time.
Out-of-the-Box Transcription Services:
- OpenAI: Using the whisper-1 model.
- Google Speech-to-Text
Out-of-the-Box Chat Services:
- OpenAI: Supporting models like gpt-4 and gpt-3.5.
- Anthropic: Supporting Claude.ai.
Extensible Design: Use the Strategy Pattern to easily add new transcription and chat services.
Configuration-Driven: Control services and settings dynamically using .env files.
Interactive UI: Built with Textual for a rich terminal-based user interface.

Image Processing

OpenAI: Analyze and process images with AI models (not used in this project).

Learn more about the thinkhub library at https://github.com/mfenerich/thinkhub.

Installation

Prerequisites

Python 3.11+
Poetry

Clone the Repository

git clone https://github.com/mfenerich/SpeechFlow.git
cd speechflow

Install Dependencies

Use Poetry to install project dependencies:

poetry install

Set Up Environment Variables

Create a .env file in the root directory with the following content:

# Transcription service
TRANSCRIPTION_SERVICE=openai or google

# Chat service
CHAT_SERVICE=openai or anthropic

# OpenAI settings
CHATGPT_API_KEY=your_openai_api_key
CHAT_MODEL=gpt-4o or claude-3-5-sonnet-20240620 (or any other model you have access to)

# Google
GOOGLE_APPLICATION_CREDENTIALS=your_gcp_json_path

Provide appropriate values for each variable.

Verify Environment

Make sure the .env file is listed in .gitignore to avoid accidentally committing sensitive data.

Usage

Run the Application

Start the SpeechFlow application:

poetry run python app.py

Interact with the Application

Select an audio input device.
Press K to start recording and K again to stop.
Press Q to quit the application.

Change Transcription or Chat Services

To use a different transcription or chat service, update the TRANSCRIPTION_SERVICE and CHAT_SERVICE variables in the .env file. Restart the application to apply the changes.

Project Structure

speechflow/
├── core/                              # Core utilities and constants
│   ├── constants.py                   # Shared constants (e.g., sample rate)
│   ├── audio_handler.py               # Audio capture and processing
│   └── interface.py                   # UI components
├── app.py                             # Main application entry point
├── .env                               # Environment variables (ignored by Git)
├── .env.example                       # Example environment variables
├── README.md                          # Project documentation
├── poetry.lock                        # Poetry lock file
├── pyproject.toml                     # Poetry project configuration
└── tests/                             # Unit tests

Extending SpeechFlow

SpeechFlow is designed to be modular and extensible. You can easily add new transcription or chat services by utilizing the thinkhub library.

Learn more about extending services in the thinkhub repository.

Testing

TODO

Workflow Diagram

graph TD
    A[User] -->|Interacts via UI| B[AudioTranscriptionApp]
    B -->|Captures Audio| C[AudioHandler]
    C -->|Sends Audio Chunks| D[TranscriptionService]
    D -->|Transcribes Audio| E[Transcription API - Google or other service]
    E -->|Returns Transcription| D
    D -->|Sends Transcription| F[ChatService]
    F -->|Queries Chat API| G[Chat API - OpenAI GPT]
    G -->|Returns Response| F
    F -->|Sends Response| B
    B -->|Displays Response| H[User Interface]

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch: git checkout -b feature-name.
Commit your changes: git commit -m "Add feature-name".
Push to the branch: git push origin feature-name.
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Textual for the interactive UI framework.
thinkhub for providing transcription and chat services.
Google Cloud Speech-to-Text for transcription services.
OpenAI for conversational AI.

Contact

For questions or support, please contact [email protected] or open an issue in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechFlow

Features

Image Processing

Installation

Prerequisites

Clone the Repository

Install Dependencies

Set Up Environment Variables

Verify Environment

Usage

Run the Application

Interact with the Application

Change Transcription or Chat Services

Project Structure

Extending SpeechFlow

Testing

Workflow Diagram

Contributing

License

Acknowledgments

Contact

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
speechflow		speechflow
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

mfenerich/SpeechFlow

Folders and files

Latest commit

History

Repository files navigation

SpeechFlow

Features

Image Processing

Installation

Prerequisites

Clone the Repository

Install Dependencies

Set Up Environment Variables

Verify Environment

Usage

Run the Application

Interact with the Application

Change Transcription or Chat Services

Project Structure

Extending SpeechFlow

Testing

Workflow Diagram

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages