SpeechFlow is a modular and extensible application for real-time audio transcription and intelligent conversational responses. It leverages transcription services and conversational AI models provided by the thinkhub
library to provide a seamless experience for speech processing and natural language understanding.
- Real-Time Audio Capture: Record and process audio in real-time.
- Out-of-the-Box Transcription Services:
- OpenAI: Using the
whisper-1
model. - Google Speech-to-Text
- OpenAI: Using the
- Out-of-the-Box Chat Services:
- OpenAI: Supporting models like
gpt-4
andgpt-3.5
. - Anthropic: Supporting
Claude.ai
.
- OpenAI: Supporting models like
- Extensible Design: Use the Strategy Pattern to easily add new transcription and chat services.
- Configuration-Driven: Control services and settings dynamically using
.env
files. - Interactive UI: Built with Textual for a rich terminal-based user interface.
- OpenAI: Analyze and process images with AI models (not used in this project).
Learn more about the thinkhub
library at https://github.com/mfenerich/thinkhub.
- Python 3.11+
- Poetry
git clone https://github.com/mfenerich/SpeechFlow.git
cd speechflow
Use Poetry to install project dependencies:
poetry install
Create a .env
file in the root directory with the following content:
# Transcription service
TRANSCRIPTION_SERVICE=openai or google
# Chat service
CHAT_SERVICE=openai or anthropic
# OpenAI settings
CHATGPT_API_KEY=your_openai_api_key
CHAT_MODEL=gpt-4o or claude-3-5-sonnet-20240620 (or any other model you have access to)
# Google
GOOGLE_APPLICATION_CREDENTIALS=your_gcp_json_path
Provide appropriate values for each variable.
Make sure the .env
file is listed in .gitignore
to avoid accidentally committing sensitive data.
Start the SpeechFlow application:
poetry run python app.py
- Select an audio input device.
- Press
K
to start recording andK
again to stop. - Press
Q
to quit the application.
To use a different transcription or chat service, update the TRANSCRIPTION_SERVICE
and CHAT_SERVICE
variables in the .env
file. Restart the application to apply the changes.
speechflow/
├── core/ # Core utilities and constants
│ ├── constants.py # Shared constants (e.g., sample rate)
│ ├── audio_handler.py # Audio capture and processing
│ └── interface.py # UI components
├── app.py # Main application entry point
├── .env # Environment variables (ignored by Git)
├── .env.example # Example environment variables
├── README.md # Project documentation
├── poetry.lock # Poetry lock file
├── pyproject.toml # Poetry project configuration
└── tests/ # Unit tests
SpeechFlow is designed to be modular and extensible. You can easily add new transcription or chat services by utilizing the thinkhub
library.
Learn more about extending services in the thinkhub repository.
TODO
graph TD
A[User] -->|Interacts via UI| B[AudioTranscriptionApp]
B -->|Captures Audio| C[AudioHandler]
C -->|Sends Audio Chunks| D[TranscriptionService]
D -->|Transcribes Audio| E[Transcription API - Google or other service]
E -->|Returns Transcription| D
D -->|Sends Transcription| F[ChatService]
F -->|Queries Chat API| G[Chat API - OpenAI GPT]
G -->|Returns Response| F
F -->|Sends Response| B
B -->|Displays Response| H[User Interface]
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
. - Commit your changes:
git commit -m "Add feature-name"
. - Push to the branch:
git push origin feature-name
. - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
- Textual for the interactive UI framework.
- thinkhub for providing transcription and chat services.
- Google Cloud Speech-to-Text for transcription services.
- OpenAI for conversational AI.
For questions or support, please contact [email protected]
or open an issue in the repository.