A Chrome extension that enables speech-to-text transcription using Google's Gemini AI. Activate with a keyboard shortcut and dictate directly into any text input field on the web.
- โจ๏ธ Keyboard Shortcut Activation: Press
Ctrl+Shift+S(orCmd+Shift+Son Mac) to start dictating - ๐ฏ Smart Input Detection: Only works when a text input field is focused
- ๐ Real-time Transcription: Powered by Gemini 2.0 Flash for accurate speech recognition
- ๐๏ธ Visual Feedback: Clear indicators when dictation is active
- ๐ก๏ธ Privacy-Focused: Your API key stays local, audio sent directly to Gemini
- ๐จ Beautiful UI: Modern, gradient-styled options page
This repository contains two versions of Gemini Dictate:
- Chrome Extension: Works directly in your browser. Supports built-in Chrome Speech-to-Text (No API key required) or Gemini AI (Premium accuracy).
- macOS Electron App: A system-wide application that works in any Mac app (Notes, Slack, etc.). Requires a Gemini API key.
You can skip this part if you only want to use the built-in Chrome Speech-to-Text engine in the Chrome Extension. You only need an API key if you want to use the high-accuracy Gemini engine in the Chrome Extension, or if you are using the Electron App.
- Create a Google Cloud Project: Follow the Google Cloud documentation to create a new project.
- Get your API Key:
- Recommended: Visit Google AI Studio to generate a free API key.
- Alternative: Use the Google Cloud Vertex AI documentation for enterprise-grade setup.
- Copy the key for the next steps.
The Chrome extension can use your browser's built-in transcription engine immediately, or you can supply a Gemini key for significantly better results.
- Download or clone this repository.
- Open Chrome and navigate to
chrome://extensions/. - Enable "Developer mode" (toggle in the top-right corner).
- Click "Load unpacked".
- Select the root folder of this repository (where
manifest.jsonis located). - (Optional) Click the extension icon in Chrome's toolbar, select Options, enter your Gemini API key, and select the Gemini engine.
- Focus: Click into any text input field on a webpage.
- Activate: Press
Cmd+Shift+S(Mac) orCtrl+Shift+S(Windows). - Dictate: Speak clearly. A visual indicator will appear.
- Stop: Press the shortcut again or click away.
A native application for system-wide dictation. Note: This version requires a Gemini API key.
- Ensure you have Node.js installed.
- Open your terminal and navigate to the
electron-appdirectory:cd electron-app - Install the dependencies:
npm install
- Start the app:
npm run start
- Configure: On first launch, enter your Gemini API key in the Settings window.
- Key Bindings: Use
Command+Shift+S(default) while focused on any application.
To create a standalone .app bundle:
npm run make/: Root directory contains the Chrome Extension files./electron-app: Contains the native macOS application.
- Microphone Access: Ensure Chrome or the Electron app has permission to access your microphone in System Settings.
- Engine Selection: If you haven't provided an API key in the extension settings, it will fallback to the default Chrome Web Speech engine.
- API Errors: Use the "Test Connection" button in the Settings page to verify your key if using Gemini/Chirp.
MIT License - feel free to use, modify, and distribute!
Made with โค๏ธ using Gemini AI