Welcome to the OpenAI Virtual Teleprompter! This innovative application transforms your spoken words into real-time text, powered by OpenAI's advanced language processing capabilities. Designed for professionals, content creators, and anyone who needs to maintain eye contact while delivering information, this tool acts as a smart, responsive teleprompter that adapts to your speech.
The OpenAI Virtual Teleprompter is not just a transcription tool; it's an intelligent assistant that can help clarify your thoughts, expand on your ideas, and even suggest improvements to your speech in real-time. By leveraging the power of OpenAI's API, it provides a seamless experience for users who need to speak coherently and confidently, whether in virtual meetings, presentations, or content creation sessions.
Key features include:
- Real-time speech-to-text conversion
- Intelligent responses and suggestions from OpenAI
- Customizable floating interface for easy viewing during use
- Ability to pause and resume speech recognition
- Adjustable opacity for the on-screen display
Whether you're a professional speaker looking to improve your delivery, a content creator needing assistance with scripting, or anyone who wants to enhance their verbal communication, the OpenAI Virtual Teleprompter is here to support and elevate your speaking experience.
- Features
- Screenshots
- Prerequisites
- Installation
- Usage
- Configuration
- Logging
- Testing
- Utilities
- Contributing
- License
- Acknowledgments
- Real-time Audio Capture: Captures audio from your microphone with adjustable settings for channels, rate, and chunk size.
- OpenAI Integration: Sends audio data to OpenAI's API for processing and receives responses.
- WebSocket Communication: Utilizes WebSockets for real-time communication between the backend and frontend.
- Electron Frontend: Provides a desktop application interface built with Electron and React.
- Draggable Floating Prompter: A customizable and movable UI component for displaying assistant responses.
- Error Handling and Logging: Comprehensive logging and error handling mechanisms for robust performance.
- Extensible Architecture: Modular design for easy extension and integration with other services.
- Automatic Reconnection: Designed to bypass the 15-minute OpenAI limitation by automatically reconnecting.
- Keyboard Control: Start and stop listening with the spacebar (initial start requires clicking the "Start Listening" button).
Note: This application is a Proof of Concept (PoC). While it has been successfully tested, users should expect potential bugs and issues.
This screenshot showcases the OpenAI Virtual Teleprompter interface, featuring:
- A "Start Listening" button to begin speech recognition
- A response area displaying the assistant's transcription and suggestions
- An API call counter to track usage
- An opacity slider for adjusting the interface transparency
- Python 3.7+
- Node.js 14+
- npm
- An OpenAI API Key: You need to have an API key from OpenAI to use their services.
- PortAudio: Required for PyAudio installation.
- PyAudio: For audio input/output in Python.
- Electron: For running the frontend application.
-
Clone the Repository
git clone https://github.com/yourusername/voice-assistant-application.git cd voice-assistant-application/backend
-
Create a Virtual Environment
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Python Dependencies
Ensure you have PortAudio installed on your system, which is required for PyAudio.
-
On Ubuntu/Debian:
sudo apt-get install libportaudio2
-
On macOS (using Homebrew):
brew install portaudio
Then install the Python dependencies:
pip install -r requirements.txt
-
-
Set Up Configuration
-
Environment Variables:
Set the
OPENAI_API_KEY
environment variable:-
On Linux/macOS:
export OPENAI_API_KEY='your_openai_api_key'
-
On Windows:
set OPENAI_API_KEY='your_openai_api_key'
-
-
Configuration File:
The
config.py
file contains settings you can adjust:max_api_calls
: Maximum number of API calls (-1
for unlimited).silence_threshold
,cooldown_duration
: Silence detection settings.rate
,channels
,frame_duration_ms
: Audio settings.
-
-
Navigate to the Root Directory
cd ../ # Assuming you're in the backend directory
-
Install Node.js Dependencies
npm install
-
Build the Frontend
npm run build
-
Ensure Electron is Installed
If Electron is not installed globally, you can install it as a dev dependency (already included in
package.json
):npm install electron --save-dev
The OpenAI Virtual Teleprompter operates through a sophisticated pipeline:
-
Audio Capture: The application uses PyAudio to capture real-time audio from your microphone. It's specifically configured for the Logitech Yeti Blue microphone, ensuring high-quality audio input.
-
Speech Processing: The captured audio is sent to OpenAI's API, which converts the speech to text and processes the content.
-
Real-time Display: The transcribed text and any AI-generated suggestions are immediately displayed on the floating interface, allowing you to read and react in real-time.
-
Intelligent Assistance: Beyond mere transcription, the OpenAI API can provide context-aware suggestions, clarifications, or expansions on your speech, enhancing your delivery.
-
User Interaction: You can control the application using voice commands or the on-screen interface, allowing you to pause, resume, or adjust settings as needed.
-
Customizable Interface: The floating prompter can be moved around the screen and its opacity adjusted, ensuring it doesn't interfere with other applications or your camera during use.
When using the OpenAI Virtual Teleprompter during virtual meetings, it's important to note that you'll need to use two microphones:
- One microphone (preferably the Logitech Yeti Blue) for the Teleprompter application
- Another microphone (such as your webcam's built-in mic) for the actual meeting audio
This setup ensures that the Teleprompter can capture your speech without interfering with your meeting audio.
- Start Listening: Click the "Start Listening" button to begin speech recognition.
- Voice Input: Speak naturally; the Teleprompter processes your speech in real-time.
- Pause/Resume: Use the spacebar or the on-screen button to control listening.
- View Transcripts and Suggestions: Read the real-time transcripts and AI suggestions on the floating interface.
- Adjust Opacity: Use the slider to change the transparency of the floating prompter.
- Reposition: Click and drag the top bar to move the floating prompter on your screen.
By leveraging these features, you can maintain natural eye contact and body language while having the support of an intelligent, real-time teleprompter.
The application can be customized using the config.py
file in the backend
directory.
- API Key and URL: Set your OpenAI API key and endpoint URL.
- Audio Settings:
rate
: Sample rate (default is 48000 Hz).channels
: Number of audio channels (default is 1).frame_duration_ms
: Duration of each audio frame in milliseconds.
- Assistant Settings:
max_api_calls
: Maximum number of API calls (-1
for unlimited).silence_threshold
: Threshold for detecting silence.cooldown_duration
: Duration to wait before listening again after a response.instructions
: Instructions or guidelines for the assistant's responses.
- Opacity: Adjusted within the UI using the slider.
- Keyboard Shortcuts:
- Spacebar: Toggle pause/resume listening.
- Window Behavior:
- The Electron window is set to always be on top and is transparent, providing an unobtrusive overlay.
Logging is configured using the common_logging.py
module in the backend
directory. It sets up both file and console logging with options for rotation and formatting.
You can adjust the logging level in your scripts when initializing the logger:
from common_logging import setup_logging
logger = setup_logging('your_module_name', debug_to_console=True)
- Parameters:
name
: Name of the logger.debug_to_console
(bool): IfTrue
, logs will also output to the console.filter_response_done
(bool): IfTrue
, applies a filter to only log specific messages.
- Logs are stored in the
logs
directory withinbackend
. - Each module has its own log file, e.g.,
voice_assistant.log
,openai_client.log
.
The tests
directory contains scripts to verify the functionality of audio devices and API interactions.
cd tests
python test_audio.py
- This script tests audio recording from your microphone and saves it to
output.wav
. - Ensure your microphone is properly connected and recognized by the system.
- The script searches for a "Blue Yeti" microphone by default. Modify the device search in the script if you have a different microphone.
- PulseAudio and PyAudio Tests:
test_pulseaudio_and_pyaudio.py
can help diagnose audio issues on systems using PulseAudio.
The utils/kill_ports.py
script checks for processes running on specific ports (e.g., 8000 and 3000) and terminates them. This is useful for ensuring that the required ports are free before starting the application.
cd utils
python kill_ports.py
- The script uses
lsof
to find and kill processes. It may require elevated permissions depending on your system configuration. - Modify the script if you need to check different ports.
We welcome contributions from the community! To contribute:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add your message"
-
Push to Your Fork
git push origin feature/YourFeature
-
Create a Pull Request
- Navigate to the original repository and open a pull request.
The OpenAI Virtual Teleprompter has been developed and tested exclusively on Linux Mint. It is not designed for or tested on any other operating systems.
-
Linux Mint Exclusivity: This application is specifically designed for Linux Mint and has not been tested or configured for any other operating system, including other Linux distributions, MacOS, or Windows.
-
Microphone Configuration: The software is optimized for use with the Logitech Yeti Blue microphone. While it may function with other microphones, optimal performance is only guaranteed with the specified model.
-
Dual Microphone Setup for Virtual Meetings: When using the Teleprompter during virtual meetings, you must use two separate microphones:
- One dedicated to the Teleprompter application (preferably the Logitech Yeti Blue)
- Another for your meeting audio (e.g., your webcam's built-in microphone)
-
Performance Considerations: The application's performance may vary depending on your system specifications and the quality of your audio input.
-
No Cross-Platform Support: This application is not cross-platform and is not intended for use on any system other than Linux Mint. There are currently no plans to extend support to other operating systems or platforms.
This project is licensed under the MIT License.
- OpenAI: For providing the powerful API that drives our intelligent teleprompter.
- Electron: For the framework that enables our desktop application interface.
- PyAudio: For reliable audio input processing in Python.
- Contributors: A heartfelt thanks to all who have contributed to this project.
- Community: For the ongoing support, feedback, and inspiration.
We welcome your questions, feedback, and contributions! Feel free to open an issue if you need assistance or have suggestions for improvement.
To run the OpenAI Virtual Teleprompter, you'll need to use two terminal windows:
-
Terminal 1 - Backend Server: Navigate to the
backend
directory and start the Voice Assistant Python script:cd backend python voice_assistant.py
This will start the backend server.
-
Terminal 2 - Electron App: In a new terminal window, navigate to the root directory of the project and launch the Electron app:
npm start
Make sure to start the backend server (Terminal 1) before launching the Electron app (Terminal 2).
For any additional information or specific sections you'd like to expand upon, please don't hesitate to ask. We're committed to making this README as informative and helpful as possible.