This is the current architecture and file-ownership map for the repository.
It is based on the actual code in src/ after the renderer modular refactor.
- Main process boot
src/main.jscallsstartApplication()insrc/main-process/start-application.js.
- Environment + state
.envis loaded/sanitized viasrc/bootstrap/environment.js.- persisted UI/runtime choices are loaded from
cache/app-state.jsonviasrc/services/state/app-state.js.
- Main-process composition
start-application.jswires:- window controller
- screenshot manager
- Gemini runtime
- AssemblyAI streaming service
- all IPC handlers
- Preload bridge
src/windows/assistant/preload.jsexposeswindow.electronAPIthroughcontextBridge.- API is modularized in
src/windows/assistant/preload/*.
- Renderer UI
src/windows/assistant/renderer.html+styles.cssdefine UI.src/windows/assistant/renderer.jsorchestrates feature managers undersrc/windows/assistant/renderer/features/*.
package.json: scripts, dependencies, andelectron-builderconfiguration..env/.env.example: runtime secrets and defaults contract.README.md: setup, usage, and high-level docs.BUILD_INSTRUCTIONS.md: build/package walkthrough.SETUP-VOSK.md: legacy STT setup notes (project now uses AssemblyAI modules in active flow).notes.md: this architecture map.src/: all application source.assets/: icons and packaging assets.cache/: generated persisted state in development (app-state.json)..stealth_screenshots/: generated screenshots in development.dist/: packaged output.
- Thin entrypoint. Starts app by calling
startApplication()and exits on fatal startup failure.
- Central source of truth for:
- Gemini model list
- AssemblyAI speech model list
- programming-language list
- global shortcut definitions
- Exposes getters/resolvers and shortcut lookup helpers.
- Handles env file path resolution (dev vs packaged).
- Loads
.envwithdotenv. - Normalizes values (booleans, ints, key arrays).
- Validates required keys (
GEMINI_API_KEY,ASSEMBLY_AI_API_KEY). - Persists settings back to
.envthroughsaveApplicationEnvironment.
- Main composition root for Electron app lifecycle.
- Initializes and wires:
- window controller
- screenshot manager
- Gemini runtime
- AssemblyAI service
- assistant/settings/assembly IPC registrations
- Loads persisted app state and restores model/language/opacity/key-index.
- Registers global shortcuts and lifecycle hooks (
whenReady,activate,will-quit).
- Logs startup config values (keys presence, selected defaults, lists).
- Safe wrapper for
webContents.sendto avoid sending to destroyed/crashed renderer.
- Window defaults and constraints:
- default/min width/height
- opacity bounds and default
- stealth opacity delta
- Owns BrowserWindow runtime behavior:
- create/destroy/get window
- opacity application + stealth mode toggle
- emergency hide behavior
- guarded recovery/reload handlers
- set/get bounds with work-area clamping
- global shortcut registration and movement shortcuts
- Runtime controller around
GeminiService:- model/language/key configuration
- active-key index tracking and persistence callback
- key rotation/failover logic on quota/auth errors
- wrapper for executing operations with automatic key fallback
- Screenshot lifecycle:
- stealth capture with temporary low-opacity window
- screenshot directory management
- screenshot retention cap cleanup
- conversion to Gemini multimodal image parts
- clear/cleanup helpers
- Assistant and AI-related IPC handlers:
- screenshot analysis
- Ask AI with transcript + optional screenshots
- suggestions/notes/insights/email/QA helpers
- clear conversation/history
- close app
- Maps raw Gemini/runtime errors into user-facing messages.
- Settings IPC handlers:
get-settingsreturns current keys, models, languages, shortcuts, opacitysave-settingspersists.env+app-state.json, reapplies runtime config
- Core Gemini service wrapper:
- model init/re-init
- request queue + rate limiting + retry/backoff
- quota/auth error detection helpers
- conversation history storage
- feature methods (
analyzeScreenshots,askAiWithSessionContext, notes/insights/etc.)
- Prompt builder library for all Gemini tasks:
- screenshot analysis
- ask-ai session mode
- suggestions
- meeting notes
- follow-up email
- direct question answering
- conversation insights
- Applies programming-language preference policy and language-specific guidance.
- AssemblyAI streaming backend:
- per-source WS connect/start/stop
- partial/final transcript events to renderer
- audio chunk intake and heartbeat/drop debug
- source state resets and cleanup
- non-streaming transcription endpoint flow (
upload->transcript-> polling)
- Merges and buffers final STT segments per source.
- Flushes merged transcripts into Gemini history on pause/stop/termination.
- Handles overlap-aware transcript merge to reduce duplicate fragments.
- IPC adapter for AssemblyAI service:
- start/stop voice recognition
- audio chunk forwarding
- desktop source listing
- offline transcription call
- Persisted app-state read/write/sanitize for
cache/app-state.json. - Stores key index, selected models/language, and window opacity level.
- BrowserWindow creation/config for transparent overlay window.
- Permission handlers for media/microphone.
- Content protection setup (
setContentProtection). - Initial visibility behavior (
launchHiddenaware).
- Exposes
window.electronAPIthroughcontextBridge. - Uses
createElectronApifactory frompreload/create-electron-api.js.
- Composes invoke and event API modules into one renderer-facing object.
- All
ipcRenderer.invokewrappers used by renderer. - Includes fallbacks and consistent logging through helper factory.
- Renderer event subscription wrappers for all push events from main process.
- Utility factories:
invokeWithFallbackcreateEventListener
- Main UI layout:
- top controls and action buttons
- transcription monitor
- AI chat area + composer
- settings panel
- close confirmation modal
- loading/emergency overlays
- resize handles
- Full visual system for overlay UI:
- glass theme variables
- chat/transcription/settings styling
- dark theme support
- resize-handle and interaction styling
- responsive layout behavior
- Global renderer typing for
window.electronAPI.
- AudioWorklet processor that batches PCM float samples and posts chunks to main thread.
- Renderer composition root.
- Instantiates managers and wires dependencies:
- message store/context bundle
- chat UI manager
- window adjustment manager
- shortcut manager
- settings panel manager
- transcription manager
- listener modules
- Owns high-level UI actions and feature flows:
- Ask AI / Screen AI
- suggestions/notes/insights
- theme switching
- feedback/loading overlays
- close confirmation
- Message-type classification and context/summary line formatting rules.
- In-memory chat message records:
- add/clear/find
- toggle
includeInAi - inclusion rules for AI context
- Builds token-budgeted AI context bundle from included messages.
- Produces:
contextStringtranscriptContextsessionSummary- enabled screenshot IDs
- Applies include/exclude toggle state to rendered chat message DOM.
- Chat rendering and local UX behavior:
- message card rendering
- AI formatting for markdown-like blocks
- auto-scroll behavior
- composer auto-resize
- manual context message submission
- Renderer-side window resize handle behavior.
- Uses
electronAPI.getWindowBounds/setWindowBoundsand pointer events. - Enforces chat fill layout after viewport changes.
- All DOM/event wiring in one place:
- button clicks
- chat input handlers
- keyboard shortcuts
- context-menu/select/drag suppression
- All renderer IPC subscriptions:
- screenshot/analysis/status events
- STT status/partial/final/error/stopped events
- global shortcut events
- STT debug event relay to monitor log
- global renderer error/unhandled rejection logging
- Parses accelerator strings and evaluates keyboard events against shortcut ids.
- Renders read-only shortcut list in settings panel.
- Settings panel behavior:
- load and populate fields/options from
getSettings - opacity label handling
- save/apply settings via
saveSettings
- load and populate fields/options from
- Renderer-side transcription orchestration:
- source toggles and status state
- monitor UI updates and monitor logs
- mic/system start/stop lifecycle
- partial/final transcript handling
- buffering/flush integration
- Shared source selection/status/active state model for renderer.
- WebAudio capture and processing pipeline:
- desktop/mic stream handling
- downsampling to 16 kHz
- frame batching and PCM16 conversion
- chunk emission via
sendAudioChunk
- Final transcript merge/buffer/flush logic in renderer before chat commit.
- Legacy Whisper worker implementation (not in active production flow).
- Legacy backup renderer for old Whisper-based approach.
- Legacy experimental Web Speech renderer (deprecated/broken path).
src/main-process/**: Electron main-process orchestration and IPC registration.src/services/**: reusable domain logic (AI, STT, state persistence).src/windows/assistant/preload/**: secure IPC bridge exposed to renderer.src/windows/assistant/renderer/features/**: renderer feature modules only.src/windows/legacy/**: reference-only old experiments.
- Add new renderer behavior under
renderer/features/*and keeprenderer.jsas composition/orchestration. - Add new invoke/listener APIs under
windows/assistant/preload/*and register matching IPC in main process. - Keep configurable lists/defaults/shortcuts in
src/config.js. - When adding env fields, update together:
src/bootstrap/environment.js.env.exampleREADME.md