|
| 1 | +# Web Mode Implementation - Technical Design |
| 2 | + |
| 3 | +> **Status**: 🚧 Draft - Not Yet Implemented |
| 4 | +> **Priority**: Medium |
| 5 | +> **Related**: [Roadmap](../roadmap.md) - "Web Mode Fallback (BROKEN)" |
| 6 | +
|
| 7 | +## Problem Statement |
| 8 | + |
| 9 | +The current "Gemini Web (Free)" option ([`sidepanel.js:746`](file:///Users/NOBKP/ai-sidekick/src/sidepanel.js#L746)) simply opens `gemini.google.com` in a new tab, providing no integration with the extension. Users without API keys have no fallback for in-panel chat. |
| 10 | + |
| 11 | +## Goals |
| 12 | + |
| 13 | +1. **Zero-Friction Fallback**: Users without API keys can still use AI assistants directly from the side panel. |
| 14 | +2. **Multi-Provider Support**: Extend to Gemini, DeepSeek, ChatGPT, and Claude. |
| 15 | +3. **Seamless UX**: Minimize context-switching and manual copy-paste. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Technical Approaches |
| 20 | + |
| 21 | +### Option A: Iframe Embedding ⚠️ Limited Viability |
| 22 | + |
| 23 | +**Concept**: Embed web chat interfaces directly in the side panel using `<iframe>`. |
| 24 | + |
| 25 | +**Pros**: |
| 26 | +- User stays in side panel. |
| 27 | +- No tab switching. |
| 28 | + |
| 29 | +**Cons**: |
| 30 | +- **Security Restrictions**: Most AI providers block `<iframe>` embedding via `X-Frame-Options: DENY` or CSP headers. |
| 31 | + - ✅ **Gemini**: May work (needs testing). |
| 32 | + - ❌ **ChatGPT**: Blocked. |
| 33 | + - ❌ **Claude**: Blocked. |
| 34 | + - ⚠️ **DeepSeek**: Unknown (needs testing). |
| 35 | + |
| 36 | +**Verdict**: Only viable for Gemini. Not a universal solution. |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +### Option B: Managed Tab with Context Injection |
| 41 | + |
| 42 | +**Concept**: Open web interface in a new tab, but inject user's prompt automatically. |
| 43 | + |
| 44 | +**Implementation**: |
| 45 | +1. User selects "Gemini Web (Free)" and types a message. |
| 46 | +2. Extension opens `https://gemini.google.com/app` in a new tab. |
| 47 | +3. Use `chrome.scripting.executeScript` to: |
| 48 | + - Wait for page load. |
| 49 | + - Find chat input field. |
| 50 | + - Insert user's prompt. |
| 51 | + - Optionally trigger "Send" button. |
| 52 | + |
| 53 | +**Pros**: |
| 54 | +- Works with all providers (no iframe restrictions). |
| 55 | +- User sees response in native interface. |
| 56 | + |
| 57 | +**Cons**: |
| 58 | +- **Login Required**: User must be logged into each service. |
| 59 | +- **DOM Fragility**: Injection relies on stable DOM selectors (e.g., `textarea[aria-label="Chat input"]`). Providers can break this at any time. |
| 60 | +- **Privacy Concerns**: Users may not want extension manipulating web pages. |
| 61 | + |
| 62 | +**Verdict**: Technically feasible but fragile and requires maintenance. |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +### Option C: Hybrid Approach (Recommended) |
| 67 | + |
| 68 | +**Concept**: Show an "onboarding dialog" when user selects Web Mode without API key. |
| 69 | + |
| 70 | +**UX Flow**: |
| 71 | +``` |
| 72 | +User selects "Gemini Web (Free)" → No API key detected |
| 73 | +
|
| 74 | +┌─────────────────────────────────────────────┐ |
| 75 | +│ ⚠️ No API Key Found │ |
| 76 | +│ │ |
| 77 | +│ To use AI Sidekick, you need: │ |
| 78 | +│ │ |
| 79 | +│ [Option 1] Enter your API keys (Recommended)│ |
| 80 | +│ → Private, encrypted, full control │ |
| 81 | +│ → Get keys: [Gemini Key] [DeepSeek Key] │ |
| 82 | +│ │ |
| 83 | +│ [Option 2] Use free web version │ |
| 84 | +│ → Opens gemini.google.com in new tab │ |
| 85 | +│ → Requires Google account login │ |
| 86 | +│ │ |
| 87 | +│ [ Set Up API Keys ] [ Continue to Web ] │ |
| 88 | +└─────────────────────────────────────────────┘ |
| 89 | +``` |
| 90 | + |
| 91 | +If user clicks **"Continue to Web"**: |
| 92 | +- Open `gemini.google.com/app` in new tab. |
| 93 | +- Copy user's prompt to clipboard automatically. |
| 94 | +- Show toast: "Prompt copied! Paste it in Gemini to continue." |
| 95 | + |
| 96 | +**Pros**: |
| 97 | +- ✅ **Clear UX**: Sets expectations (no magic integration). |
| 98 | +- ✅ **No Fragility**: No reliance on DOM selectors. |
| 99 | +- ✅ **Privacy-Friendly**: Extension doesn't manipulate web pages. |
| 100 | +- ✅ **Extensible**: Easy to add ChatGPT, Claude, DeepSeek with same pattern. |
| 101 | + |
| 102 | +**Cons**: |
| 103 | +- User must manually paste (1 extra step). |
| 104 | + |
| 105 | +**Verdict**: Best balance of simplicity, reliability, and user experience. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Recommended Implementation (Phase 1) |
| 110 | + |
| 111 | +### 1. Update Model Selection Logic |
| 112 | + |
| 113 | +**File**: `src/sidepanel.js` |
| 114 | + |
| 115 | +**Current Code** (Line 746): |
| 116 | +```javascript |
| 117 | +if (model === 'gemini-web') { |
| 118 | + window.open('https://gemini.google.com/app', '_blank'); |
| 119 | + return; |
| 120 | +} |
| 121 | +``` |
| 122 | + |
| 123 | +**New Code**: |
| 124 | +```javascript |
| 125 | +if (model === 'gemini-web') { |
| 126 | + showWebModeDialog('gemini', userMessage); |
| 127 | + return; |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +### 2. Create Web Mode Dialog |
| 132 | + |
| 133 | +**File**: `src/lib/web-mode-dialog.js` (new) |
| 134 | + |
| 135 | +```javascript |
| 136 | +function showWebModeDialog(provider, userPrompt) { |
| 137 | + const providerConfig = { |
| 138 | + gemini: { |
| 139 | + name: 'Gemini', |
| 140 | + url: 'https://gemini.google.com/app', |
| 141 | + keyUrl: 'https://aistudio.google.com/app/apikey' |
| 142 | + }, |
| 143 | + deepseek: { |
| 144 | + name: 'DeepSeek', |
| 145 | + url: 'https://chat.deepseek.com', |
| 146 | + keyUrl: 'https://platform.deepseek.com/api_keys' |
| 147 | + }, |
| 148 | + chatgpt: { |
| 149 | + name: 'ChatGPT', |
| 150 | + url: 'https://chatgpt.com', |
| 151 | + keyUrl: null // No API key option |
| 152 | + }, |
| 153 | + claude: { |
| 154 | + name: 'Claude', |
| 155 | + url: 'https://claude.ai', |
| 156 | + keyUrl: 'https://console.anthropic.com/' |
| 157 | + } |
| 158 | + }; |
| 159 | + |
| 160 | + const config = providerConfig[provider]; |
| 161 | + |
| 162 | + // Show dialog with options: |
| 163 | + // 1. "Set Up API Keys" → chrome.runtime.openOptionsPage() |
| 164 | + // 2. "Continue to Web" → copyToClipboard(userPrompt) + open(config.url) |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +### 3. Add Clipboard Permission |
| 169 | + |
| 170 | +**File**: `manifest.json` |
| 171 | + |
| 172 | +```json |
| 173 | +"permissions": [ |
| 174 | + "clipboardWrite", // Already exists ✅ |
| 175 | + // ... |
| 176 | +] |
| 177 | +``` |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Future Enhancements (Phase 2+) |
| 182 | + |
| 183 | +### Multi-Provider Dropdown |
| 184 | + |
| 185 | +Instead of just "🌐 Gemini Web (Free)", show: |
| 186 | +```html |
| 187 | +<optgroup label="🌐 Web Mode (Free, No Keys)"> |
| 188 | + <option value="gemini-web">Gemini (Google Account)</option> |
| 189 | + <option value="deepseek-web">DeepSeek (Free Account)</option> |
| 190 | + <option value="chatgpt-web">ChatGPT (OpenAI Account)</option> |
| 191 | + <option value="claude-web">Claude (Anthropic Account)</option> |
| 192 | +</optgroup> |
| 193 | +``` |
| 194 | + |
| 195 | +### Browser Action Integration |
| 196 | + |
| 197 | +Add a "Quick Switch" button in the side panel: |
| 198 | +``` |
| 199 | +[API Mode: Gemini Flash ▼] |
| 200 | + → Gemini 2.5 Pro (API) |
| 201 | + → DeepSeek R1 (API) |
| 202 | + ─────────────── |
| 203 | + → Gemini Web (Free) 🌐 |
| 204 | + → ChatGPT Web (Free) 🌐 |
| 205 | +``` |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## Testing Checklist |
| 210 | + |
| 211 | +- [ ] Verify dialog appears when Web Mode selected without API key. |
| 212 | +- [ ] Confirm clipboard copy works (test on Mac/Windows/Linux). |
| 213 | +- [ ] Ensure correct URL opens for each provider. |
| 214 | +- [ ] Test "Set Up API Keys" button redirects to Options. |
| 215 | +- [ ] Validate toast notification shows after clipboard copy. |
| 216 | + |
| 217 | +--- |
| 218 | + |
| 219 | +## Open Questions |
| 220 | + |
| 221 | +1. **Should we support iframe for Gemini if it works?** |
| 222 | + → Decision: Test first. If unreliable, use unified "open tab + copy" approach. |
| 223 | + |
| 224 | +2. **Should we inject prompt automatically using content scripts?** |
| 225 | + → Decision: No. Too fragile. Manual paste is acceptable trade-off. |
| 226 | + |
| 227 | +3. **Should we track which web interface the user prefers?** |
| 228 | + → Decision: Phase 2. Store in `chrome.storage.local`. |
| 229 | + |
| 230 | +--- |
| 231 | + |
| 232 | +## References |
| 233 | + |
| 234 | +- Current Implementation: [`src/sidepanel.js:746`](file:///Users/NOBKP/ai-sidekick/src/sidepanel.js#L746) |
| 235 | +- Roadmap Task: [`docs/roadmap.md`](file:///Users/NOBKP/ai-sidekick/docs/roadmap.md) |
0 commit comments