Skip to content

Feat/extension with chatbot and voice#25

Open
Sathursan-S wants to merge 89 commits intomasterfrom
feat/extension-with-chatbot-and-voice
Open

Feat/extension with chatbot and voice#25
Sathursan-S wants to merge 89 commits intomasterfrom
feat/extension-with-chatbot-and-voice

Conversation

@Sathursan-S
Copy link
Copy Markdown
Owner

@Sathursan-S Sathursan-S commented Nov 30, 2025

This pull request introduces several documentation, configuration, and changelog updates to the Browser.AI project, focusing on improving developer experience, transparency, and support for new features such as the Chrome extension and intelligent website selection. The most significant changes include the creation of a comprehensive root-level changelog, documentation of a major new intelligent site selection feature, and updates to the GUI documentation to reflect new capabilities.

Project Documentation and Changelog Improvements

  • Created a root-level CHANGELOG.md documenting all major changes, features, and updates across the Browser.AI project, following the Keep a Changelog format and including migration guides and version history. [1] [2]
  • Added a summary changelog file CHANGELOG_SUMMARY.md describing the structure, purpose, and usage of the new changelog system for maintainers and users.

Major Feature Release: Intelligent Website Selection

  • Documented the new "Intelligent Website Selection" feature in CHANGELOG_INTELLIGENT_SELECTION.md, detailing a new action for dynamic site selection, multi-site fallback strategies, enhanced prompts, backward compatibility, and migration notes.

GUI and Extension Documentation Updates

  • Updated GUI_README.md to include the Chrome extension as a supported interface, describe its features, and provide installation instructions and usage workflows. [1] [2] [3]
  • Added extension documentation references to .github/copilot-instructions.md to clarify the docs directory structure.

Configuration and Environment Updates

  • Updated .env.example to remove the hardcoded Gemini API key and clarify the environment variable setup for LLM providers.
  • Updated .python-version to specify Python 3.12 for the project.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added intelligent website selection for shopping tasks based on location and purpose
    • Introduced location detection for localized shopping experiences with currency support
    • Launched Chrome extension interface with side panel UI for task management
    • Added conversational chatbot mode for interactive task clarification
    • Implemented voice input and text-to-speech capabilities in extension
    • New GIF history visualization of agent actions
  • Documentation

    • Comprehensive setup guides and quick-start instructions
    • CDP WebSocket server documentation for developers
  • Chores

    • Updated Python version support to 3.12
    • Enhanced build tooling and development configuration

✏️ Tip: You can customize this high-level summary in your review settings.

Sathursan-S and others added 30 commits October 4, 2025 14:19
… script, devtools, options, popup, and new tab functionalities

- Added background script to handle messages from popup
- Implemented content script with basic logging
- Created devtools page with a link to the repository
- Developed options page to sync and display count from popup
- Built popup with increment and decrement functionality for count
- Designed new tab page displaying the current time with a background image
- Established side panel to show synchronized count from popup
- Configured manifest for Chrome extension with necessary permissions and pages
- Set up Vite configuration for building the extension
- Included TypeScript configuration for type safety
- Added CSS styles for various components to enhance UI
…live logs

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
…e extension

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
…ference

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
- Implemented a WebSocket server to facilitate communication between the Chrome extension and Browser.AI using the Chrome DevTools Protocol (CDP).
- Created CDPBrowserContextManager to manage browser contexts and commands for each tab.
- Added event handling for attaching and detaching tabs, sending CDP commands, and starting tasks.
- Defined WebSocket events for client-server communication, including connection acknowledgment and task status updates.
- Included detailed logging and error handling for better debugging and monitoring.
- Documented the WebSocket server architecture, usage examples, and command formats in the README.
… handling

- Implemented notification HTML structure and CSS styles.
- Created Notification component to display messages based on URL parameters or background script messages.
- Added loading spinner and dynamic notification types (user interaction, task complete, error).
- Developed index file to render Notification component.
- Introduced ChatInput component for user message input with send functionality.
- Created ControlButtons component for task control (start, pause, resume, stop).
- Implemented ExecutionLog component to display logs with different levels and metadata.
- Added TaskStatus component to show current task status with visual indicators.
- Created LOG_STREAMING_FIX.md to document the problem and solution for log streaming issues between the Browser.AI server and Chrome extension.
- Unified LogEvent dataclass definitions in event_adapter.py and protocol.py to ensure compatibility and consistency in data types.
- Updated websocket_server.py to simplify event broadcasting and remove redundant serialization methods.
- Enhanced PROTOCOL.md with detailed WebSocket communication protocol, including data structures and event definitions.
- Summarized protocol implementation changes in PROTOCOL_IMPLEMENTATION_SUMMARY.md, highlighting updates to TypeScript and Python protocol files.
- Added type-safe protocol definitions in protocol.ts for TypeScript and protocol.py for Python, ensuring type safety and consistency.
- Developed verify_log_streaming_fix.py to validate the compatibility of LogEvent objects with the protocol and ensure proper serialization.
…Chrome APIs

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
…actor WebSocket server error handling and testing
…enhance event emission and error handling in websocket server
…rowser.AI Chrome extension

- Added detailed documentation on state management and persistence in STATE_MANAGEMENT.md
- Summarized state management changes and benefits in STATE_MANAGEMENT_SUMMARY.md
- Introduced modern UI features and components in UI_FEATURES.md
- Created a visual guide for the extension's UI in UI_GUIDE.md
- Enhanced user experience with a professional settings page in UX_IMPROVEMENTS.md
- Improved chat interface and execution log formatting for better clarity
- Integrated Chrome API for settings management and real-time updates
- Optimized rendering and memory management for better performance
…d-b0a6-3dd7274a776d

Add Chrome Extension with Side Panel UI, Settings Page, User-Friendly Messages, Global State Persistence, and WebSocket Control for Browser Automation
… script, devtools, options, popup, and new tab functionalities

- Added background script to handle messages from popup
- Implemented content script with basic logging
- Created devtools page with a link to the repository
- Developed options page to sync and display count from popup
- Built popup with increment and decrement functionality for count
- Designed new tab page displaying the current time with a background image
- Established side panel to show synchronized count from popup
- Configured manifest for Chrome extension with necessary permissions and pages
- Set up Vite configuration for building the extension
- Included TypeScript configuration for type safety
- Added CSS styles for various components to enhance UI
…live logs

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
…e extension

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
…ference

Co-authored-by: Sathursan-S <84266926+Sathursan-S@users.noreply.github.com>
- Implemented a WebSocket server to facilitate communication between the Chrome extension and Browser.AI using the Chrome DevTools Protocol (CDP).
- Created CDPBrowserContextManager to manage browser contexts and commands for each tab.
- Added event handling for attaching and detaching tabs, sending CDP commands, and starting tasks.
- Defined WebSocket events for client-server communication, including connection acknowledgment and task status updates.
- Included detailed logging and error handling for better debugging and monitoring.
- Documented the WebSocket server architecture, usage examples, and command formats in the README.
Sathursan-S and others added 27 commits October 9, 2025 22:14
- Added VoiceConversationService to manage continuous voice interactions.
- Integrated voice recognition and text-to-speech functionalities.
- Implemented automatic turn-taking, silence detection, and message handling.
- Enhanced UI with Live Voice Mode, including status indicators and live transcript display.
- Updated ConversationMode component to support live voice interactions.
- Removed deprecated VoiceInput and VoiceSettings components.
- Updated CSS styles for new live voice features.
- Modified agent configuration to use the latest planner model and added planner interval setting.
- Created comprehensive documentation for Live Voice Mode including guides, quick start, and implementation details.
- Added state diagrams and user journey visualizations to enhance understanding of the Live Voice Mode workflow.
- Introduced a new service `VoiceConversation.ts` to manage the voice interaction flow.
- Enhanced UI components to support Live Voice Mode with real-time transcript display and animated status indicators.
- Implemented error handling and configuration options for a better user experience.
- Removed deprecated styles and components related to conversation header and buttons.
- Introduced new input field structure with integrated voice and send buttons.
- Enhanced voice button functionality to handle live and manual voice input modes.
- Updated button states and titles for better user experience.
- Improved styling for input fields, buttons, and live voice status to ensure consistency and clarity.
… ChatInput components; remove unused ConversationView files
- Introduced new PNG icons in various sizes (16x16, 32x32, 48x48, 128x128) for the browser AI extension.
- Icons are added to the public icons and img directories to enhance the visual representation of the extension.
…nd visual guides

- Added Voice Input (Speech-to-Text) and Speech Output (Text-to-Speech) services
- Integrated UI components for microphone and speech toggle functionality
- Enhanced styling for a professional appearance and smooth user experience
- Created comprehensive documentation including implementation details, usage examples, and error handling
- Developed visual guides and workflow diagrams for user interactions and troubleshooting
- Established a memory tracking pattern for intelligent website selection workflows
- Refactor `Controller` by moving actions to `browser_ai/actions/` with categorized modules (navigation, interaction, extraction, utility).
- Refactor `Agent` by extracting media generation logic to `browser_ai/agent/media.py`.
- Refactor `browser_ai_gui` by moving `TaskManager` to `browser_ai_gui/services/task_manager.py`.
- Fix `EventType` usage in `test_extension_server.py`.
- General cleanup of imports and structure.
- Unify Chat and Agent modes into a single seamless interface.
- Implement "Smart Steps" visualization to group technical logs into high-level task steps.
- Add Light/Dark mode theming with a clean "Copilot-like" default.
- Inject a floating status overlay into the active web page.
- Update Tailwind configuration and React components to support the new design.
- Unify Chat and Agent modes into a single seamless interface.
- Implement "Smart Steps" visualization to group technical logs into high-level task steps.
- Add Light/Dark mode theming with a clean "Copilot-like" default.
- Inject a floating status overlay into the active web page.
- Add "Concentration Mode" with Jarvis-style voice visualization.
- Add Sticky Task Status header that expands on completion.
- Update Tailwind configuration and React components to support the new design.
Refactor: Clean code and restructure folders
- Removed unused ChatMessage component.
- Updated ConversationMode.css for improved styling and added visual effects.
- Enhanced ConversationMode.tsx to include audio visualization logic using Web Audio API.
- Modified TaskStatusHeader to remove auto-expand functionality and improve user control.
- Introduced VoiceVisualizer component for dynamic audio level visualization during voice interactions.
- Added task and chat history management functions in state.ts for persistent storage.
@gemini-code-assist
Copy link
Copy Markdown

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 30, 2025

Walkthrough

This PR introduces significant enhancements to Browser.AI including intelligent website selection, a comprehensive event bus system, location detection services, and a full Chrome extension with real-time chat and voice capabilities. Additionally, documentation is restructured and environment/build configurations are expanded.

Changes

Cohort / File(s) Summary
Environment & Configuration
.env.example, .gitignore, .python-version, browser_ai_extension/browse_ai/.*, browser_ai_extension/browse_ai/tsconfig.*, browser_ai_extension/browse_ai/vite.config.ts, browser_ai_extension/browse_ai/tailwind.config.cjs, browser_ai_extension/browse_ai/postcss.config.cjs, browser_ai_extension/browse_ai/package.json, browser_ai_extension/browse_ai/pnpm-workspace.yaml
Added Python 3.12 requirement, extension .gitignore, npm workspace configuration, Vite/Tailwind/PostCSS build tooling, and EditorConfig standards.
Documentation & Changelogs
CHANGELOG.md, CHANGELOG_INTELLIGENT_SELECTION.md, CHANGELOG_SUMMARY.md, GUI_README.md, LAUNCHER_SUMMARY.md, QUICK_START.md, RUN_PROJECT_GUIDE.md, browser_ai_extension/browse_ai/CHANGELOG.md, browser_ai_extension/browse_ai/README.md, browser_ai_extension/docs/*
Comprehensive restructuring of project documentation with new changelogs for major features, launcher/quick start guides, extension-specific documentation (CDP WebSocket, chatbot feature), and Chrome extension README.
Intelligent Website Selection Feature
browser_ai/actions/navigation.py, browser_ai/actions/extraction.py, browser_ai/controller/views.py
Introduced FindBestWebsiteAction, enhanced SearchEcommerceAction to accept optional site parameter, and added actions for intelligent website discovery, location detection, and multi-site e-commerce search with fallback strategies.
Location Detection Service
browser_ai/location_service.py, browser_ai/actions/extraction.py, browser_ai/controller/service.py
New LocationDetector class with geolocation capabilities, location database for multiple countries, currency/ecommerce context helpers, and integration into action pipeline for location-aware shopping workflows.
Event Bus System
browser_ai/event_bus/core.py, browser_ai/event_bus/events.py, browser_ai/event_bus/handlers/console.py
Complete event-driven architecture with EventManager singleton, EventHandler base class, 40+ strongly-typed event classes covering agent, browser, DOM, controller, LLM, validation, planning, state, error, metrics, user interaction, and extension domains.
Agent & Controller Enhancements
browser_ai/agent/service.py, browser_ai/agent/prompts.py, browser_ai/agent/media.py, browser_ai/agent/views.py, browser_ai/controller/service.py, browser_ai/browser/context.py
Major refactoring with latency tracking, shopping-action auto-detection, GIF history generation, user intervention/pause/resume flows, enhanced error handling, expanded prompts for multi-site workflows, and new user_input_request field in ActionResult.
Action Modules
browser_ai/actions/__init__.py, browser_ai/actions/navigation.py, browser_ai/actions/interaction.py, browser_ai/actions/extraction.py, browser_ai/actions/utility.py
Introduced modularized action system with public exports for navigation (Google/YouTube search, AI-assisted search, website finding), interaction (clicks, scrolling, text input, dropdown handling), extraction (content, location detection, ecommerce search), and utility (task completion, tab management, user help/questions).
Utilities & Helpers
browser_ai/utils.py
Added singleton decorator, LatencyAnalyzer for performance tracking with CSV export, and type annotation updates.
Chrome Extension: Core
browser_ai_extension/browse_ai/src/background/index.ts, browser_ai_extension/browse_ai/src/manifest.ts, browser_ai_extension/browse_ai/src/global.d.ts, browser_ai_extension/browse_ai/*.html
Background service worker with messaging and notification handling, MV3 manifest configuration, global type declarations, and HTML entry points (popup, options, devtools, sidepanel, notification).
Chrome Extension: UI Components
browser_ai_extension/browse_ai/src/components/*, browser_ai_extension/browse_ai/src/ui/*
Reusable UI primitives (Button, Card, Input, Badge) and application components (CustomScroll, SamplePrompts, Overlay).
Chrome Extension: Sidepanel & Conversation
browser_ai_extension/browse_ai/src/sidepanel/SidePanel.tsx, browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.tsx, browser_ai_extension/browse_ai/src/sidepanel/components/ChatInput.tsx, browser_ai_extension/browse_ai/src/sidepanel/components/ChatMessages.tsx, browser_ai_extension/browse_ai/src/sidepanel/components/*
Main sidepanel UI with WebSocket integration, conversation mode with real-time chat, voice input/output (text-to-speech, speech recognition, voice conversation), task status tracking, execution logs, and control buttons.
Chrome Extension: Services
browser_ai_extension/browse_ai/src/services/TextToSpeech.ts, browser_ai_extension/browse_ai/src/services/VoiceRecognition.ts, browser_ai_extension/browse_ai/src/services/VoiceConversation.ts
Speech synthesis service with voice selection, speech recognition with error handling and transcript management, and autonomous voice conversation orchestration with state machine.
Chrome Extension: Styling
browser_ai_extension/browse_ai/src/devtools/DevTools.css, browser_ai_extension/browse_ai/src/notification/Notification.css, browser_ai_extension/browse_ai/src/options/Options.css, browser_ai_extension/browse_ai/src/sidepanel/SidePanel.css, browser_ai_extension/browse_ai/src/sidepanel/components/*.css, browser_ai_extension/browse_ai/src/styles/tailwind.css, browser_ai_extension/browse_ai/src/sidepanel/index.css
Comprehensive dark/light themed styling with Tailwind integration, animations, responsive layouts, and interactive states across all extension UI surfaces.
Chrome Extension: Pages & Utilities
browser_ai_extension/browse_ai/src/devtools/DevTools.tsx, browser_ai_extension/browse_ai/src/notification/Notification.tsx, browser_ai_extension/browse_ai/src/options/Options.tsx, browser_ai_extension/browse_ai/src/popup/Popup.tsx, browser_ai_extension/browse_ai/src/content/Overlay.tsx, browser_ai_extension/browse_ai/src/utils/helpers.ts, browser_ai_extension/browse_ai/src/utils/state.ts, browser_ai_extension/browse_ai/src/utils/theme.tsx
DevTools interface, notification system, options page with server configuration, popup with counter, content overlay, settings/state persistence helpers, and theme context provider.
Chrome Extension: Protocol & Config
browser_ai_extension/browse_ai/src/types/protocol.ts, browser_ai_extension/browse_ai/src/zip.js
WebSocket protocol definition with strongly-typed events (LogLevel, EventType, ExtensionSettings, TaskStatus, ActionResult), client/server event maps, type guards, and build script for packaging the extension.
DOM Styling Update
browser_ai/dom/buildDomTree.js
Updated element highlight color palette from vibrant to grayscale/neutral palette.

Sequence Diagrams

sequenceDiagram
    participant User
    participant Extension as Chrome Extension<br/>(SidePanel)
    participant Server as Browser.AI<br/>Server
    participant Agent as Agent Service
    participant LLM as LLM (Gemini)
    participant Browser as Browser<br/>Controller

    User->>Extension: Enter task
    Extension->>Server: emit start_task (task, CDP endpoint)
    Server->>Agent: Initialize with task
    Agent->>LLM: Get initial planning
    LLM-->>Agent: Plan steps
    
    loop Agent Execution Loop
        Agent->>Agent: Get state (screenshots, DOM)
        Agent->>LLM: Decide next action
        LLM-->>Agent: Return action (type, params)
        Agent->>Browser: Execute action
        Browser-->>Agent: Action result
        Agent->>Server: emit log_event (step completed)
        Server->>Extension: Broadcast log_event
        Extension->>Extension: Update UI (step status)
    end
    
    Agent->>LLM: Final result assessment
    LLM-->>Agent: Task complete
    Agent->>Server: emit task_completed
    Server->>Extension: Broadcast status (completed)
    Extension->>User: Display result & offer GIF download
Loading
sequenceDiagram
    participant User
    participant Extension as Extension:<br/>ConversationMode
    participant VoiceRec as Voice<br/>Recognition
    participant VoiceConv as Voice<br/>Conversation Service
    participant TTS as Text-to-<br/>Speech
    participant Server as Server
    participant Chatbot as Chatbot<br/>Service

    User->>Extension: Click "Start Live Voice"
    Extension->>VoiceRec: Initialize speech recognition
    VoiceRec->>VoiceRec: Start listening
    
    User->>VoiceRec: Speak task intent
    VoiceRec->>VoiceConv: Transcript received (interim/final)
    VoiceConv->>Server: Send user message
    Server->>Chatbot: Process message
    Chatbot->>Chatbot: Generate response + detect intent
    Chatbot-->>Server: Response + intent
    Server->>Extension: Emit chat_response
    Extension->>TTS: Synthesize bot response
    TTS->>User: Play audio
    
    alt If intent detected (task ready)
        Chatbot-->>Server: Task detected
        Server->>Extension: Emit intent (task_description, is_ready)
        Extension->>Extension: Show "Ready to automate" prompt
        User->>Extension: Confirm automation
        Extension->>Server: emit start_clarified_task
        Server->>Agent: Start task execution
    else
        VoiceConv->>VoiceRec: Resume listening
        User->>VoiceRec: Continue conversation
    end
Loading
sequenceDiagram
    participant User as User/<br/>Website
    participant Agent as Agent
    participant Controller as Controller
    participant LocationDet as Location<br/>Detector
    participant WebsiteFinder as Find Best<br/>Website Action
    participant Navigator as Search<br/>Navigation
    participant Ecommerce as Search<br/>Ecommerce Action

    Agent->>Agent: Detect shopping task
    Agent->>Controller: Execute find_best_website
    Controller->>LocationDet: Detect user location
    LocationDet->>LocationDet: Query geolocation API
    LocationDet-->>Controller: Location (country, currency, region)
    
    Controller->>WebsiteFinder: find_best_website(purpose, category)
    WebsiteFinder->>Navigator: search_google_with_ai (location-aware query)
    Navigator->>User: Open Google + AI mode
    User-->>Navigator: Render AI search results
    Navigator-->>WebsiteFinder: Extract top websites
    
    WebsiteFinder-->>Controller: Recommended website + context
    
    Controller->>Ecommerce: search_ecommerce(query, site=recommended)
    Ecommerce->>Navigator: Navigate to website + currency context
    Navigator->>User: Load ecommerce site
    User-->>Navigator: Render products
    Navigator-->>Ecommerce: Product list + page content
    
    Ecommerce-->>Agent: Search completed with results
    Agent->>Agent: Continue with product selection/purchase
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Complexity factors:

  • Heterogeneous scope: Combines Python backend enhancements (event bus, location service, agent refactoring), new action modules, and a complete Chrome extension with TypeScript/React UI, services, and WebSocket communication—each requiring separate reasoning.
  • High logic density: Agent refactoring includes latency tracking, shopping-action auto-detection, pause/resume flows, and enhanced error handling. Event system spans 40+ event class definitions with varied payloads. Extension integrates speech synthesis, recognition, and voice conversation orchestration.
  • Multiple integration points: Event bus wires into agent/controller; location service integrates into actions; extension communicates via WebSocket with socket.io; voice services coordinate with conversation mode and TTS/recognition APIs.
  • Diverse file types: Python modules, TypeScript/React components, CSS stylesheets, HTML templates, configuration files, and documentation—each requires attention to different patterns and conventions.
  • Interaction complexity: Agent lifecycle now includes user intervention detection, page URL monitoring for auto-resume, GIF generation on exit, and latency CSV export.

Areas requiring extra attention:

  • browser_ai/agent/service.py — Major refactoring with new initialization patterns, latency integration, shopping-action injection, and user-intervention/pause/resume logic; verify interaction with existing controller and history flows.
  • browser_ai/event_bus/core.py and events.py — Complete event system with 40+ event classes; ensure topic naming and event payload structure align with publisher expectations; validate singleton and thread-safety for sync/async handlers.
  • browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.tsx — Complex state management for voice recognition, TTS, and conversation service; verify audio context cleanup, event listener removal, and state transitions; check WebSocket message handling.
  • browser_ai_extension/browse_ai/src/utils/state.ts and helpers.ts — Cross-tab state synchronization and chrome.storage.* APIs; verify error handling, race conditions in concurrent load/save, and listener cleanup.
  • Action modules (navigation.py, extraction.py, interaction.py, utility.py) — Multiple new public functions with varying parameter signatures; cross-check with action registry, LLM prompt expectations, and controller integration.
  • Browser context page reuse logic (browser_ai/browser/context.py) — Conditional page reuse based on CDP URL presence; verify compatibility with existing session workflows and multi-tab scenarios.

Poem

🐰 A warren of features springs to life,
Location, voice, and chat through strife,
Events cascade in harmony's dance,
While extensions waltz in a UI trance!
From Python depths to React's bright shore,
The Browser.AI opens a brand-new door. 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.53% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Feat/extension with chatbot and voice' accurately reflects the major additions in this changeset, which includes substantial Chrome extension infrastructure, chatbot/conversation features, and voice recognition/synthesis services.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/extension-with-chatbot-and-voice

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
browser_ai/agent/prompts.py (1)

125-265: Fix inconsistent rule numbering in important_rules().

The rule numbers are inconsistent and duplicated:

  • Line 125: "6. SEARCH STRATEGIES"
  • Line 149: "7. LOCATION-AWARE SHOPPING"
  • Line 166: "7. INTELLIGENT WEBSITE SELECTION" (duplicate 7)
  • Line 186: "8. FAST PRODUCT RESULTS"
  • Line 197: "9. MULTI-SITE SEARCH STRATEGY"
  • Line 206: "10. TASK COMPLETION"
  • Line 228: "9. VISUAL CONTEXT" (should be 11)
  • Line 236: "10. Form filling" (should be 12)
  • Line 239: "11. ACTION SEQUENCING" (should be 13)
  • Line 254: "9. Long tasks" (should be 14)
  • Line 258: "10. SCROLLING BEHAVIOR" (should be 15)
  • Line 266: "11. Extraction" (should be 16)
  • Line 269: "12. DOCUMENT DOWNLOADING" (should be 17)
  • Line 283: "13. EMAIL SENDING" (should be 18)

This could confuse the LLM when referencing specific rules.

🟠 Major comments (27)
browser_ai_extension/browse_ai/src/sidepanel/components/Visuals/VoiceVisualizer.tsx-17-142 (1)

17-142: Stale closure: audioLevel changes won't update the visualization.

The draw() function captures audioLevel from the closure when useEffect runs. Since audioLevel is not in the dependency array (line 142), the animation will always use the initial/stale value and won't respond to real-time audio level changes—defeating the visualizer's purpose.

Use a ref to hold the current audioLevel value so the animation loop always reads the latest value without restarting:

 export const VoiceVisualizer: React.FC<VoiceVisualizerProps> = ({
   isListening,
   isSpeaking,
   audioLevel = 0,
 }) => {
   const canvasRef = useRef<HTMLCanvasElement>(null)
   const animationRef = useRef<number>()
+  const audioLevelRef = useRef(audioLevel)
+
+  // Keep ref in sync with prop
+  useEffect(() => {
+    audioLevelRef.current = audioLevel
+  }, [audioLevel])

   useEffect(() => {
     const canvas = canvasRef.current
     ...
     const draw = () => {
+      const currentAudioLevel = audioLevelRef.current
       ...
-      const dynamicScale = isListening && audioLevel > 0.01 ? 1 + audioLevel * 0.5 : 1
+      const dynamicScale = isListening && currentAudioLevel > 0.01 ? 1 + currentAudioLevel * 0.5 : 1
       // Apply the same pattern to all other audioLevel usages in draw()
       ...
     }
     ...
-  }, [isListening, isSpeaking])
+  }, [isListening])

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/components/CustomScroll.tsx-16-58 (1)

16-58: Fix global style tag lifecycle to support multiple mounted instances

When multiple CustomScroll instances are mounted concurrently, each creates its own styleSheet reference but all share the same #custom-scroll-styles element. The cleanup logic uses instance-specific reference comparison (existingStyle === styleSheet), which causes the first instance to remove the style when unmounting—breaking scrollbar styling for remaining instances.

Since scrollbar CSS is global and should exist once per page, replace the per-instance lifecycle with idempotent creation:

-  React.useEffect(() => {
-    const scrollElement = scrollRef.current
-    if (!scrollElement) return
-
-    // Add custom scrollbar styles via JavaScript
-    const styleSheet = document.createElement('style')
-    styleSheet.textContent = `
-      .custom-scroll::-webkit-scrollbar {
-        width: 8px;
-      }
-      
-      .custom-scroll::-webkit-scrollbar-track {
-        background: rgba(255, 255, 255, 0.05);
-        border-radius: 4px;
-      }
-      
-      .custom-scroll::-webkit-scrollbar-thumb {
-        background: rgba(255, 255, 255, 0.2);
-        border-radius: 4px;
-        transition: background-color 0.2s ease;
-      }
-      
-      .custom-scroll::-webkit-scrollbar-thumb:hover {
-        background: rgba(255, 255, 255, 0.3);
-      }
-      
-      .custom-scroll::-webkit-scrollbar-corner {
-        background: transparent;
-      }
-    `
-    
-    if (!document.head.querySelector('#custom-scroll-styles')) {
-      styleSheet.id = 'custom-scroll-styles'
-      document.head.appendChild(styleSheet)
-    }
-
-    return () => {
-      const existingStyle = document.head.querySelector('#custom-scroll-styles')
-      if (existingStyle && existingStyle === styleSheet) {
-        document.head.removeChild(styleSheet)
-      }
-    }
-  }, [])
+  React.useEffect(() => {
+    if (typeof document === 'undefined') return
+
+    if (!document.head.querySelector('#custom-scroll-styles')) {
+      const styleSheet = document.createElement('style')
+      styleSheet.id = 'custom-scroll-styles'
+      styleSheet.textContent = `
+        .custom-scroll::-webkit-scrollbar {
+          width: 8px;
+        }
+        
+        .custom-scroll::-webkit-scrollbar-track {
+          background: rgba(255, 255, 255, 0.05);
+          border-radius: 4px;
+        }
+        
+        .custom-scroll::-webkit-scrollbar-thumb {
+          background: rgba(255, 255, 255, 0.2);
+          border-radius: 4px;
+          transition: background-color 0.2s ease;
+        }
+        
+        .custom-scroll::-webkit-scrollbar-thumb:hover {
+          background: rgba(255, 255, 255, 0.3);
+        }
+        
+        .custom-scroll::-webkit-scrollbar-corner {
+          background: transparent;
+        }
+      `
+      document.head.appendChild(styleSheet)
+    }
+  }, [])
browser_ai_extension/browse_ai/LICENSE-1-21 (1)

1-21: Replace the ** placeholder with the actual copyright holder

The MIT text is correct, but Line 3 still has ** as a placeholder. Replace this with the actual copyright owner name (individual or organization) so the license is legally clear and unambiguous.

browser_ai_extension/browse_ai/src/popup/Popup.tsx-4-58 (1)

4-58: Replace boilerplate popup with actual Browser.AI functionality.

This appears to be template/boilerplate code from create-chrome-ext with a counter demo. The popup should reflect actual Browser.AI features (e.g., status display, quick actions, settings) rather than a counter interface with a generator watermark.

Do you want me to suggest a popup design aligned with the Browser.AI agent functionality described in the PR?

browser_ai_extension/browse_ai/src/devtools/DevTools.css-90-94 (1)

90-94: Remove duplicate main selector.

The main element is already styled at lines 8-17 with different properties. This duplicate definition will override previous styles and cause conflicts (e.g., padding: 1em vs padding: 32px 20px).

-main {
-  text-align: center;
-  padding: 1em;
-  margin: 0 auto;
-}

If center alignment is needed, add it to the first main block (lines 8-17).

browser_ai_extension/browse_ai/src/sidepanel/components/ExecutionLog.tsx-152-152 (1)

152-152: Remove console.log statement.

Debug logging should be removed from production code to avoid console clutter and potential information leakage.

Apply this diff:

   const formatUserMessage = (log: LogEvent): string => {
     // In dev mode, return original message
     if (devMode) return log.message
 
-    console.log('Formatting user message:', log)
-
     const msg = log.message.toLowerCase()
browser_ai/agent/prompts.py-430-442 (1)

430-442: Avoid mutable default argument.

Using [] as a default argument is a Python anti-pattern because the same list instance is shared across all calls.

 def __init__(
     self,
     state: BrowserState,
     result: Optional[List[ActionResult]] = None,
-    include_attributes: list[str] = [],
+    include_attributes: Optional[list[str]] = None,
     max_error_length: int = 400,
     step_info: Optional[AgentStepInfo] = None,
 ):
     self.state = state
     self.result = result
     self.max_error_length = max_error_length
-    self.include_attributes = include_attributes
+    self.include_attributes = include_attributes if include_attributes is not None else []
     self.step_info = step_info
browser_ai_extension/browse_ai/src/notification/index.tsx-6-7 (1)

6-7: Add runtime check for missing notification-root element.

Unlike devtools/index.tsx, this code uses a non-null assertion without first verifying the element exists. If notification-root is missing from notification.html, this will throw a cryptic error at runtime.

Apply this diff to add proper error handling:

-const container = document.getElementById('notification-root')
-const root = createRoot(container!)
+const container = document.getElementById('notification-root')
+if (!container) {
+  throw new Error(
+    'Notification root element #notification-root not found. Ensure notification.html includes this element.',
+  )
+}
+const root = createRoot(container)
browser_ai_extension/browse_ai/notification.html-6-6 (1)

6-6: Fix broken icon path that references non-existent file.

The notification.html references /img/logo-48.png which does not exist in the repository. Devtools.html and options.html use /icons/logo.ico. The actual icon files are located at public/icons/logo.ico and public/icons/logo.svg. Update notification.html to use the correct icon path consistent with the other files.

browser_ai_extension/browse_ai/src/sidepanel/components/ExecutionLog.css-1-36 (1)

1-36: Remove duplicate style definitions.

Lines 1-36 define a complete set of styles that are immediately overridden by lines 37-318. All class names (.execution-log-container, .execution-log-header, .log-badge, etc.) are defined twice with different values, making the first 36 lines dead code that will never be applied.

This appears to be an incomplete refactoring from dark to light theme. Consider:

  • Option 1 (recommended): Remove lines 1-36 entirely and use CSS custom properties with theme classes for dark/light mode support (similar to the pattern in browser_ai_extension/browse_ai/src/sidepanel/index.css).
  • Option 2: Keep only lines 37-318 if light theme is the final design direction.

Apply this diff to remove the duplicate definitions:

-.execution-log-container { display:flex; flex-direction:column; height:100%; background: rgba(255,255,255,0.02); border-radius:12px; overflow:hidden; }
-.execution-log-header { display:flex; align-items:center; justify-content:space-between; padding:12px 16px; border-bottom:1px solid rgba(255,255,255,0.02); }
-.log-header-title { display:flex; align-items:center; gap:10px; color: rgba(255,255,255,0.9); }
-.log-header-title svg { color: #8aa0ff; filter: drop-shadow(0 6px 18px rgba(2,6,23,0.6)); }
-.log-header-title h3 { font-size:14px; font-weight:600; margin:0; }
-.log-count { display:inline-flex; align-items:center; justify-content:center; min-width:24px; height:20px; padding:0 6px; background:#667eea; color:white; font-size:11px; font-weight:600; border-radius:10px; }
-.log-clear-btn { display:flex; align-items:center; gap:4px; padding:6px 12px; background: transparent; border:1px solid rgba(255,255,255,0.04); border-radius:6px; font-size:12px; font-weight:500; color: rgba(255,255,255,0.8); cursor:pointer; transition: all 0.2s ease; }
-.log-clear-btn:hover { background: rgba(255,255,255,0.03); border-color: rgba(255,255,255,0.06); color: white; }
-.execution-log-content { flex:1; overflow-y:auto; background: transparent; }
-.log-empty-state { display:flex; flex-direction:column; align-items:center; justify-content:center; height:100%; padding:40px 20px; color: rgba(255,255,255,0.6); text-align:center; }
-.log-empty-state svg { margin-bottom:16px; opacity:0.5; }
-.log-empty-state p { font-size:15px; font-weight:600; color: rgba(255,255,255,0.9); margin:0 0 6px 0; }
-.log-empty-state span { font-size:13px; color: rgba(255,255,255,0.6); }
-.log-entries { padding:12px; }
-.log-entry { position:relative; background: rgba(255,255,255,0.02); border:1px solid rgba(255,255,255,0.03); border-radius:8px; padding:12px 14px; margin-bottom:8px; transition: all 0.2s ease; opacity:0; animation: slideIn 0.3s ease forwards; }
-.log-entry:hover { box-shadow: 0 6px 18px rgba(2,6,23,0.6); border-color: rgba(255,255,255,0.06); }
-.log-entry-header { display:flex; align-items:center; gap:8px; margin-bottom:8px; }
-.log-icon { font-size:16px; line-height:1; }
-.log-timestamp { font-size:11px; color: rgba(255,255,255,0.6); font-weight:500; font-family: 'Courier New', monospace; }
-.log-badge { font-size:10px; font-weight:600; padding:2px 8px; border-radius:4px; text-transform:uppercase; letter-spacing:0.5px; }
-.log-badge.log-info { background: rgba(138,160,255,0.12); color: #8aa0ff; }
-.log-badge.log-error { background: rgba(255,120,120,0.12); color: #ff9a9a; }
-.log-badge.log-warning { background: rgba(255,210,120,0.08); color: #fbbf24; }
-.log-badge.log-result { background: rgba(34,197,94,0.08); color: #86efac; }
-.log-badge.log-debug { background: rgba(255,255,255,0.02); color: rgba(255,255,255,0.8); }
-.log-message { font-size:13px; line-height:1.6; color: rgba(255,255,255,0.9); word-wrap: break-word; white-space: pre-wrap; }
-.log-entry.log-error { border-left: 3px solid rgba(255,120,120,0.9); }
-.log-entry.log-warning { border-left: 3px solid rgba(255,210,120,0.9); }
-.log-entry.log-result { border-left: 3px solid rgba(16,185,129,0.9); }
-.log-entry.log-step-entry { border-left: 3px solid #667eea; }
-.log-metadata { margin-top:10px; padding:8px 10px; background: rgba(255,255,255,0.02); border-radius:4px; border:1px solid rgba(255,255,255,0.02); font-size:11px; }
-.metadata-item { display:flex; gap:6px; padding:2px 0; font-family: 'Courier New', monospace; }
-.metadata-key { color: rgba(255,255,255,0.6); font-weight:600; }
-.metadata-value { color: rgba(255,255,255,0.9); word-break:break-all; }
-
-@keyframes slideIn { from { opacity:0; transform: translateY(10px);} to { opacity:1; transform: translateY(0);} }
 .execution-log-container {

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/manifest.ts-4-59 (1)

4-59: Update @crxjs/vite-plugin to the latest stable version to resolve type mismatches.

The manifest contains 15 @ts-ignore directives suppressing type checking. The project currently uses version 2.0.0-beta.26; upgrading to the latest stable release 2.2.1 (Oct 2025) should resolve these type mismatches, as the stable version has improved TypeScript support.

After updating the dependency, remove the @ts-ignore comments and address any remaining type errors with explicit type assertions if needed.

browser_ai_extension/browse_ai/src/options/Options.tsx-94-120 (1)

94-120: Network requests lack timeout handling.

The fetch calls at Lines 99 and 136 have no timeout. If the server is unresponsive, the UI will hang indefinitely in "loading" or "saving" state.

Add timeout using AbortController:

 const loadServerConfig = async () => {
   setConfigStatus('loading')
   setConnectionStatus('connecting')
+  const controller = new AbortController()
+  const timeoutId = setTimeout(() => controller.abort(), 10000) // 10s timeout

   try {
-    const response = await fetch(`${settings.serverUrl}/api/config`)
+    const response = await fetch(`${settings.serverUrl}/api/config`, {
+      signal: controller.signal,
+    })
+    clearTimeout(timeoutId)
     // ...
   } catch (error) {
+    clearTimeout(timeoutId)
     console.error('Failed to load server config:', error)
     // ...
   }
 }

Also applies to: 122-161

browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.tsx-319-336 (1)

319-336: Stale closure: messages array captured at callback registration time.

The setMessages([...messages, userMessage]) inside the callback uses the messages value from when toggleLiveVoiceMode was called, not the current state. Use functional update to avoid stale state.

         // Message ready callback - auto send
         (message: string) => {
           console.log('🎙️ Auto-sending message:', message)

           // Add user message to chat
           const userMessage: Message = {
             role: 'user',
             content: message,
             timestamp: new Date().toISOString(),
           }

-          setMessages([...messages, userMessage])
+          setMessages((prevMessages) => [...prevMessages, userMessage])
           setIsProcessing(true)
           isWaitingForResponseRef.current = true
           setLiveTranscript('')

           // Send to backend
           socket.emit('chat_message', { message })
         },
browser_ai/location_service.py-269-318 (1)

269-318: Location detection navigates away from user's current page.

detect_location_from_browser uses page.goto() to navigate to ipapi.co, which replaces the user's current page content. Consider opening a new tab, detecting location, then closing it to avoid disrupting the user's browsing session.

     async def detect_location_from_browser(self, browser_context) -> Optional[LocationInfo]:
         try:
             page = await browser_context.get_current_page()
+            original_url = page.url
             
             # Use a geolocation detection service
             await page.goto("https://ipapi.co/json/", wait_until="networkidle")
-            await page.wait_for_load_state("networkidle")
             
             # ... extraction logic ...
             
+            # Navigate back to original page if it wasn't blank
+            if original_url and original_url != "about:blank":
+                await page.goto(original_url, wait_until="load")
+            
             return location_info

Alternatively, use a new tab pattern similar to search_google_with_ai.

Committable suggestion skipped: line range outside the PR's diff.

browser_ai/actions/navigation.py-184-190 (1)

184-190: Use browser.navigate_to() instead of direct page.goto() for consistent URL validation.

The go_to_url function bypasses the allowlist security checks implemented in BrowserContext.navigate_to(). Replace line 186 with await browser.navigate_to(params.url) to enforce URL validation if allowed_domains are configured.

browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.tsx-514-521 (1)

514-521: Incomplete cleanup on unmount: MediaStream tracks not stopped.

The unmount cleanup only closes AudioContext but doesn't stop the MediaStream tracks, which keeps the microphone indicator active.

   // Cleanup audio on unmount
   useEffect(() => {
     return () => {
+      if (streamRef.current) {
+        streamRef.current.getTracks().forEach(track => track.stop())
+      }
       if (audioContextRef.current) {
         audioContextRef.current.close()
       }
     }
   }, [])
browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.tsx-436-512 (1)

436-512: Audio stream not stopped on cleanup; potential resource leak.

The MediaStream obtained from getUserMedia is never stopped. When cleaning up, the stream's tracks should be stopped to release the microphone.

+  const streamRef = useRef<MediaStream | null>(null)
+
   // Audio analysis for visualization
   useEffect(() => {
     if ((isListening || isLiveVoiceMode) && !audioContextRef.current) {
       const initAudio = async () => {
         try {
           const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
+          streamRef.current = stream

           // ... rest of setup ...
         } catch (err) {
           console.error('Error initializing audio visualization:', err)
         }
       }

       initAudio()
     } else if (!isListening && !isLiveVoiceMode && audioContextRef.current) {
       // Cleanup
       if (animationFrameRef.current) {
         cancelAnimationFrame(animationFrameRef.current)
       }
+      if (streamRef.current) {
+        streamRef.current.getTracks().forEach(track => track.stop())
+        streamRef.current = null
+      }
       if (sourceRef.current) {
         sourceRef.current.disconnect()
       }
browser_ai/agent/media.py-154-156 (1)

154-156: regular_font.path may not exist on default fonts.

When fonts fail to load (lines 67-71), ImageFont.load_default() is used, which doesn't have a .path attribute. Line 154 then attempts regular_font.path, which will raise an AttributeError.

Add a fallback or skip the larger font creation when using default fonts:

+ # Check if we can create a larger font
+ if hasattr(regular_font, 'path'):
      larger_font = ImageFont.truetype(
          regular_font.path, regular_font.size + 16
      )
+ else:
+     larger_font = regular_font  # Use default font as-is
browser_ai/agent/media.py-67-71 (1)

67-71: goal_font is assigned but never used.

The variable goal_font is loaded from the font file but never referenced anywhere in the code. This is dead code.

Remove the unused variable or use it where intended (perhaps in _add_overlay_to_image?):

              regular_font = ImageFont.truetype(font_name, font_size)
              title_font = ImageFont.truetype(font_name, title_font_size)
-             goal_font = ImageFont.truetype(font_name, goal_font_size)
              font_loaded = True
              break

Based on Ruff hint (F841).

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/sidepanel/SidePanel.tsx-240-271 (1)

240-271: task_started handler saves potentially stale state to history.

The handler captures taskStatus, taskResult, logs, mode, and messages at socket creation time due to stale closures. When task_started fires later, these values may be outdated, causing incorrect history entries.

Use refs or functional state updates to capture current values:

  newSocket.on('task_started', (data: { message: string }) => {
+   // Use functional updates to access current state
-   if (taskStatus.current_task || taskResult || logs.length > 0) {
-     setTaskHistory((prev) => [
-       ...prev,
-       {
-         task: taskStatus.current_task || 'Unknown Task',
-         result: taskResult,
-         logs: [...logs],
+   setTaskHistory((prev) => {
+     // Access current values via refs if needed
+     return [...prev, /* ... */]
+   })

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/sidepanel/SidePanel.tsx-175-332 (1)

175-332: Socket effect has missing dependencies and potential stale closure issues.

The socket connection useEffect at line 176-332 has several concerns:

  1. Missing dependencies: The effect uses mode, messages, and taskStatus but doesn't include them in the dependency array. This causes stale closure issues where event handlers capture outdated values.

  2. Stale mode and messages: The connect handler (line 192) and task_started handler (lines 240-265) reference mode and messages which won't update when these values change.

  3. Stale taskStatus: The task_started handler references taskStatus (line 242) but it's not in deps.

Consider using refs for values needed in socket callbacks, or restructuring the effect:

+ const modeRef = useRef(mode)
+ const messagesRef = useRef(messages)
+ const taskStatusRef = useRef(taskStatus)
+ 
+ useEffect(() => { modeRef.current = mode }, [mode])
+ useEffect(() => { messagesRef.current = messages }, [messages])
+ useEffect(() => { taskStatusRef.current = taskStatus }, [taskStatus])

  // Then in socket handlers, use modeRef.current instead of mode

Committable suggestion skipped: line range outside the PR's diff.

browser_ai/agent/media.py-326-444 (1)

326-444: _create_frame function appears to be dead code with deprecated API usage.

This function:

  1. Is never called anywhere in the file
  2. Uses deprecated draw.textsize() (removed in Pillow 10.0.0)
  3. Has unused local variables (max_text_width, title_font)
  4. Uses a bare except: clause

Consider removing this dead code entirely, or if it's needed for future use, fix the deprecated API calls and issues:

- def _create_frame(
-     screenshot: str,
-     text: str,
-     step_number: int,
-     width: int = 1200,
-     height: int = 800,
- ) -> Image.Image:
-     ...entire function...

Based on Ruff hints (F841, E722) and Pillow API deprecation.

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/services/VoiceConversation.ts-234-241 (1)

234-241: Infinite restart loop possible on persistent recognition errors.

If startListening continuously fails (e.g., microphone permission denied), the error handler will keep retrying every 1 second indefinitely. This could drain battery and spam logs.

Add a retry limit or exponential backoff:

+ private recognitionRetryCount: number = 0
+ private static readonly MAX_RECOGNITION_RETRIES = 3

  (error: string) => {
    console.error('Recognition error:', error)
    if (this.onError) this.onError(error)

    // On error, try to restart if still active
-   if (this.isActive) {
+   if (this.isActive && this.recognitionRetryCount < VoiceConversationService.MAX_RECOGNITION_RETRIES) {
+     this.recognitionRetryCount++
      setTimeout(() => {
        if (this.isActive) {
          this.startListening()
        }
      }, 1000)
+   } else if (this.isActive) {
+     this.stop()
+     if (this.onError) this.onError('Max recognition retries exceeded')
    }
  },

Committable suggestion skipped: line range outside the PR's diff.

browser_ai/agent/media.py-32-40 (1)

32-40: Duplicate validation checks for history.

Lines 32-34 check if history.history is empty, and lines 38-40 check the same condition again. The second check is redundant.

Consolidate the checks:

  def create_history_gif(...) -> None:
      """Create a GIF from the agent's history with overlaid task and goal text."""
-     if not history.history:
-         logger.warning("No history to create GIF from")
-         return
-
-     images = []
-     # if history is empty or first screenshot is None, we can't create a gif
-     if not history.history or not history.history[0].state.screenshot:
+     if not history.history or not history.history[0].state.screenshot:
          logger.warning("No history or first screenshot to create GIF from")
          return
+
+     images = []
browser_ai/agent/service.py-62-62 (1)

62-62: Avoid mutable default argument: Controller() called in parameter defaults.

Calling Controller() in the function signature creates a single shared instance across all calls where no controller is provided. This can lead to unexpected state sharing between Agent instances.

-        controller: Controller = Controller(),
+        controller: Controller | None = None,

Then initialize within __init__:

self.controller = controller if controller is not None else Controller()
browser_ai/agent/service.py-90-90 (1)

90-90: Unused parameter: tool_call_in_content.

This parameter is accepted but never used in the constructor or elsewhere. Either implement its functionality or remove it to avoid confusion.

browser_ai/agent/service.py-453-463 (1)

453-463: Critical: return inside finally block silences exceptions.

The return statement on line 460 inside the finally block will silence any exceptions that were raised in the try or except blocks. This can mask errors and make debugging difficult.

Additionally, actions variable (line 454-458) is assigned but never used.

         finally:
-            actions = (
-                [a.model_dump(exclude_unset=True) for a in model_output.action]
-                if model_output
-                else []
-            )
             if not result:
-                return
+                result = []
 
             if state:
                 self._make_history_item(model_output, state, result)
🟡 Minor comments (34)
browser_ai/dom/buildDomTree.js-46-59 (1)

46-59: Verify visual distinguishability with this grayscale palette—minimum color distance is only 3.5 (3.5% RGB difference).

The grayscale palette is technically correct but raises valid distinguishability concerns. Color distance analysis reveals #C0C0C0 and #BEBEBE (indices 4 and 5) differ by only 3.5 in RGB space—these are virtually indistinguishable. When highlighting 10+ elements, users cycling through similar gray shades via the modulo indexing (line 60) will struggle to match numeric labels to visual borders.

The 12-color palette cycles in a way that puts these nearly identical colors adjacent (index 4→5), which is problematic when multiple elements are highlighted simultaneously.

Consider visual testing with sequential element highlighting to confirm users can reliably distinguish between highlighted borders at different indices. If distinguishability proves insufficient, increase contrast between adjacent palette entries or reduce the palette size.

.github/copilot-instructions.md-113-113 (1)

113-113: Fix minor typo in docs path description

Change “relevent” to “relevant” in the line:

docs/ - Documentation and technical specifications (keep all docs and md files here within relevent folders)

to avoid the spelling error in user-facing documentation.

browser_ai_extension/browse_ai/CHANGELOG.md-12-12 (1)

12-12: Verify the version timestamp.

The timestamp 2025.10.04 (October 4) predates the PR creation (November 30, 2025). Ensure this date reflects the actual initial version release or update it to align with current development.

browser_ai_extension/browse_ai/package.json-5-6 (1)

5-6: Complete the package metadata.

The author field contains a placeholder (**) and description is empty. These should be populated before publication.

-  "author": "**",
-  "description": "",
+  "author": "Your Name <email@example.com>",
+  "description": "Browser.AI Chrome extension with intelligent browsing capabilities",
browser_ai_extension/browse_ai/CHANGELOG.md-15-15 (1)

15-15: Fix markdown syntax for the link.

The image syntax ![create-chrome-ext](url) will attempt to render as an image. Use link syntax instead.

-- feat: generator by ![create-chrome-ext](https://github.com/guocaoyi/create-chrome-ext)
+- feat: generator by [create-chrome-ext](https://github.com/guocaoyi/create-chrome-ext)
browser_ai_extension/browse_ai/src/devtools/DevTools.css-73-75 (1)

73-75: Remove duplicate utility class definitions.

.h-10 (line 74) duplicates line 46, and .w-full (line 75) duplicates line 47. These redundant definitions can cause confusion and maintenance issues.

-/* pill button style (bottom) */
-.rounded-full { border-radius: 9999px; }
-.h-10 { height: 40px; }
-.w-full { width: 100%; }
+/* pill button style (bottom) */
+.rounded-full { border-radius: 9999px; }

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/ui/Card.tsx-31-44 (1)

31-44: Fix type mismatch in CardTitle.

CardTitle declares React.forwardRef<HTMLParagraphElement, ...> but renders an <h3> element (HTMLHeadingElement). The ref type should match the actual element type.

Apply this diff:

 const CardTitle = React.forwardRef<
-  HTMLParagraphElement,
+  HTMLHeadingElement,
   React.HTMLAttributes<HTMLHeadingElement>
 >(({ className, ...props }, ref) => (
CHANGELOG.md-262-262 (1)

262-262: Fix typo in branch name.

The branch name contains a typo: "extention" should be "extension". This should be corrected if the actual branch name is feat/extension-with-chatbot-and-voice (as indicated in the PR title).

Apply this diff:

 - **Project**: Browser.AI
 - **Repository**: Browser.AI by Sathursan-S
-- **Branch**: feat/browser-extention (development)
+- **Branch**: feat/extension-with-chatbot-and-voice (development)
 - **License**: See LICENSE file
browser_ai_extension/browse_ai/src/manifest.ts-57-59 (1)

57-59: Remove the unused debugger permission from manifest.

The debugger permission (line 57) is declared but not actively used anywhere in the codebase—all code utilizing chrome.debugger is commented out and disabled. The extension uses direct CDP connections via local Playwright setup instead of the extension-proxy mode that required this permission.

The <all_urls> host_permissions (line 59) is legitimate and necessary to support the content scripts that inject on all URLs (manifest line 43). However, this powerful permission scope should be documented in the README to clarify its necessity for users.

Actions:

  1. Remove 'debugger' from the permissions array (line 57)
  2. Add a brief note to README.md explaining why <all_urls> host_permissions are required (e.g., "The extension injects content scripts across all websites to enable browser automation capabilities")
browser_ai_extension/browse_ai/src/options/Options.css-437-452 (1)

437-452: Resolve duplicate font-size on .btn to satisfy linter and avoid confusion

In the .btn rule you define font-size twice (15px then 13px), and the linter rightly flags this as suspicious:

.btn {
  ...
  font-size: 15px;
  ...
  font-size: 13px;
}

Since only the last declaration takes effect, you should remove one of them (probably the earlier 15px) to make the intended size explicit and clear to tooling:

 .btn {
   padding: 16px 32px;
   border: none;
   border-radius: 12px;
-  font-size: 15px;
   font-weight: 600;
   cursor: pointer;
   transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
   font-family: inherit;
   position: relative;
   overflow: hidden;
   text-transform: uppercase;
   letter-spacing: 0.5px;
   font-size: 13px;
 }
browser_ai_extension/browse_ai/src/ui/Button.tsx-6-41 (1)

6-41: Default button type to "button" to prevent accidental form submissions

The <button> element currently relies on the browser default (type="submit"), which can cause unintended form submissions if this component is used inside a <form> element. As a shared UI primitive, it's safer to default to type="button" and let callers explicitly opt into "submit" when needed.

Suggested change:

const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
-  ({ className = '', variant = 'default', size = 'default', ...props }, ref) => {
+  ({ className = '', variant = 'default', size = 'default', type = 'button', ...props }, ref) => {
     return (
       <button
         className={`${getButtonClasses(variant, size)} ${className}`}
         ref={ref}
+        type={type}
         {...props}
       />
     )
   }
 )
browser_ai_extension/browse_ai/src/sidepanel/components/ConversationMode.css-556-559 (1)

556-559: Missing animation keyframe definitions.

The livePulse (Line 558) and recordPulse (Line 608) animations are referenced but not defined in this file. This will cause the animations to silently fail.

Add the missing keyframe definitions:

+@keyframes livePulse {
+  0%, 100% {
+    transform: scale(1);
+    box-shadow: 0 0 20px rgba(16, 185, 129, 0.4);
+  }
+  50% {
+    transform: scale(1.05);
+    box-shadow: 0 0 30px rgba(16, 185, 129, 0.6);
+  }
+}
+
+@keyframes recordPulse {
+  0%, 100% {
+    transform: scale(1);
+    box-shadow: 0 0 20px rgba(239, 68, 68, 0.4);
+  }
+  50% {
+    transform: scale(1.05);
+    box-shadow: 0 0 30px rgba(239, 68, 68, 0.6);
+  }
+}

Also applies to: 608-609

browser_ai_extension/browse_ai/src/sidepanel/components/ControlButtons.tsx-123-126 (1)

123-126: Status indicator color inconsistent with paused state.

The status dot is always green (bg-green-400) even when isPaused is true and the text shows "Paused". Consider using yellow/amber for the paused state to match the Pause button styling.

       <div className="flex items-center gap-2">
-        <div className="w-3 h-3 rounded-full bg-green-400 animate-pulse shadow-lg shadow-green-400/40"></div>
+        <div className={`w-3 h-3 rounded-full animate-pulse shadow-lg ${
+          isPaused 
+            ? 'bg-yellow-400 shadow-yellow-400/40' 
+            : 'bg-green-400 shadow-green-400/40'
+        }`}></div>
         <span className="text-sm font-medium text-white/90">{isPaused ? 'Paused' : 'Running'}</span>
       </div>
browser_ai_extension/browse_ai/src/options/Options.tsx-88-92 (1)

88-92: Missing loadServerConfig in useEffect dependency array.

loadServerConfig is called inside the effect but not listed as a dependency. Since loadServerConfig references settings.serverUrl, this can lead to stale closures. Either add loadServerConfig to dependencies (with useCallback), or inline the fetch logic.

Consider wrapping loadServerConfig with useCallback:

+import { useState, useEffect, useCallback } from 'react'
-import { useState, useEffect } from 'react'
...
-  const loadServerConfig = async () => {
+  const loadServerConfig = useCallback(async () => {
     setConfigStatus('loading')
     // ... rest of function
-  }
+  }, [settings.serverUrl])

   useEffect(() => {
     if (settings.serverUrl) {
       loadServerConfig()
     }
-  }, [settings.serverUrl])
+  }, [settings.serverUrl, loadServerConfig])

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/options/Options.tsx-346-349 (1)

346-349: parseInt without NaN validation.

If the user clears the input field, parseInt('') returns NaN, which will be stored in state. This could cause unexpected behavior downstream.

Add validation:

                   onChange={(e) => handleChange('maxLogs', parseInt(e.target.value))}
+                  onChange={(e) => {
+                    const val = parseInt(e.target.value, 10)
+                    if (!isNaN(val)) handleChange('maxLogs', val)
+                  }}

The same issue applies to Lines 462-463 (parseFloat), 561-562 (parseInt), and 581-582 (parseInt).

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/utils/theme.tsx-18-20 (1)

18-20: Validate localStorage value before using as Theme.

localStorage.getItem('theme') could return any string (e.g., if manually edited). The type assertion as Theme is unsafe and could lead to unexpected values being used.

The suggested fix above addresses this by explicitly checking for 'dark'.

browser_ai_extension/browse_ai/src/notification/Notification.tsx-19-24 (1)

19-24: Validate type parameter before assignment.

The type assertion on line 20 is unsafe. params.get('type') can return any string or null, but the code assumes it's a valid NotificationData['type']. This could lead to unexpected behavior if an invalid type is passed.

     const params = new URLSearchParams(window.location.search)
-    const type = params.get('type') as NotificationData['type']
+    const rawType = params.get('type')
+    const validTypes = ['user_interaction', 'task_complete', 'error'] as const
+    const type = validTypes.includes(rawType as any) ? rawType as NotificationData['type'] : null
     const message = params.get('message') || ''
browser_ai_extension/browse_ai/src/notification/Notification.tsx-144-147 (1)

144-147: Handle invalid timestamp gracefully.

If notification.timestamp contains an invalid date string, new Date(notification.timestamp).toLocaleString() will display "Invalid Date" to users. Consider adding validation.

           <div className="notification-timestamp">
-            {new Date(notification.timestamp).toLocaleString()}
+            {(() => {
+              const date = new Date(notification.timestamp)
+              return isNaN(date.getTime()) ? 'Unknown time' : date.toLocaleString()
+            })()}
           </div>
browser_ai/utils.py-60-68 (1)

60-68: Singleton decorator is not thread-safe.

If wrapper is called concurrently from multiple threads, the check-then-set on instance[0] can result in multiple instances being created.

If thread-safety is needed:

+import threading
+
 def singleton(cls):
     instance = [None]
+    lock = threading.Lock()

     def wrapper(*args, **kwargs):
-        if instance[0] is None:
-            instance[0] = cls(*args, **kwargs)
+        if instance[0] is None:
+            with lock:
+                if instance[0] is None:
+                    instance[0] = cls(*args, **kwargs)
         return instance[0]

     return wrapper
browser_ai_extension/browse_ai/src/utils/helpers.ts-17-27 (1)

17-27: Add error handling to loadSettings.

saveSettings checks chrome.runtime.lastError, but loadSettings does not. Add consistent error handling for storage operations.

 export async function loadSettings(): Promise<ExtensionSettings> {
   return new Promise((resolve) => {
     chrome.storage.sync.get(['settings'], (result: any) => {
+      if (chrome.runtime.lastError) {
+        console.warn('Failed to load settings:', chrome.runtime.lastError)
+        resolve(DEFAULT_SETTINGS)
+        return
+      }
       if (result.settings) {
         resolve({ ...DEFAULT_SETTINGS, ...result.settings })
       } else {
         resolve(DEFAULT_SETTINGS)
       }
     })
   })
 }
browser_ai/controller/service.py-40-46 (1)

40-46: Avoid mutable default argument.

exclude_actions: list[str] = [] is a mutable default that can cause unexpected behavior if modified. Use None and initialize inside the function.

     def __init__(
         self,
-        exclude_actions: list[str] = [],
+        exclude_actions: Optional[list[str]] = None,
         output_model: Optional[Type[BaseModel]] = None,
         latency_analyzer: Optional[LatencyAnalyzer] = None,
     ):
-        self.exclude_actions = exclude_actions
+        self.exclude_actions = exclude_actions or []
browser_ai/actions/extraction.py-137-138 (1)

137-138: Add error handling and timeout for navigation.

page.goto() can fail due to network errors, DNS failures, or invalid constructed URLs (especially for the generic fallback at line 89). Consider wrapping in try/except and adding a timeout.

-    await page.goto(search_url)
-    await page.wait_for_load_state()
+    try:
+        await page.goto(search_url, timeout=30000)
+        await page.wait_for_load_state(timeout=30000)
+    except Exception as e:
+        logger.warning(f"Navigation to {search_url} failed: {e}")
+        return ActionResult(extracted_content=f"⚠️ Failed to navigate to {site}: {e}", include_in_memory=True)
browser_ai/actions/extraction.py-64-64 (1)

64-64: Use proper URL encoding for search queries.

replace(" ", "+") only handles spaces. Special characters like &, =, #, ? in queries will break URLs or cause incorrect searches.

+from urllib.parse import quote_plus
+
 async def search_ecommerce(
     params: SearchEcommerceAction, browser: BrowserContext, location_detector: LocationDetector
 ):
     page = await browser.get_current_page()
-    search_query = params.query.replace(" ", "+")
+    search_query = quote_plus(params.query)

Committable suggestion skipped: line range outside the PR's diff.

browser_ai/location_service.py-257-257 (1)

257-257: Typo in e-commerce site URL.

"mighty ape.co.nz" should be "mightyape.co.nz" (no space).

-        preferred_ecommerce_sites=["trademe.co.nz", "themarket.co.nz", "mighty ape.co.nz"]
+        preferred_ecommerce_sites=["trademe.co.nz", "themarket.co.nz", "mightyape.co.nz"]
browser_ai/actions/navigation.py-31-40 (1)

31-40: Inconsistent URL encoding: use urllib.parse.quote_plus instead of manual replacement.

Manual space replacement with + doesn't handle other special characters. For consistency with search_google_with_ai, use urllib.parse.quote_plus.

 async def search_youtube(params: SearchYouTubeAction, browser: BrowserContext):
     page = await browser.get_current_page()
-    search_query = params.query.replace(" ", "+")
     await page.goto(
-        f"https://www.youtube.com/results?search_query={search_query}"
+        f"https://www.youtube.com/results?search_query={urllib.parse.quote_plus(params.query)}"
     )

Committable suggestion skipped: line range outside the PR's diff.

browser_ai_extension/browse_ai/src/services/TextToSpeech.ts-183-187 (1)

183-187: Inconsistent error handling: throws after calling onError.

The speak method calls onError callback and then throws. Callers might not expect both. Consider either throwing OR calling the callback, not both.

   public speak(
     text: string,
     options: TextToSpeechOptions = {},
     onProgress?: SpeechProgressCallback,
     onEnd?: SpeechEndCallback,
     onError?: SpeechErrorCallback
   ): void {
     if (!this.isSupported) {
       const error = 'Speech Synthesis not supported'
-      if (onError) onError(error)
-      throw new Error(error)
+      if (onError) {
+        onError(error)
+        return
+      }
+      throw new Error(error)
     }
browser_ai/actions/navigation.py-22-29 (1)

22-29: Missing URL encoding for search query.

The params.query is directly interpolated into the URL without encoding. Special characters (e.g., &, #, +) could break the URL or cause unexpected behavior. Use urllib.parse.quote_plus for consistency with search_google_with_ai.

+import urllib.parse
+
 async def search_google(params: SearchGoogleAction, browser: BrowserContext):
     page = await browser.get_current_page()
     # Try to avoid CAPTCHAs by not using shopping mode for general searches
-    await page.goto(f"https://www.google.com/search?q={params.query}")
+    await page.goto(f"https://www.google.com/search?q={urllib.parse.quote_plus(params.query)}")
     await page.wait_for_load_state()
browser_ai/actions/navigation.py-163-165 (1)

163-165: Missing URL encoding for find_best_website search query.

Manual space replacement doesn't handle special characters. Use urllib.parse.quote_plus for consistency.

-    encoded_query = search_query.replace(" ", "+")
-    await page.goto(f"https://www.google.com/search?q={encoded_query}")
+    await page.goto(f"https://www.google.com/search?q={urllib.parse.quote_plus(search_query)}")

Committable suggestion skipped: line range outside the PR's diff.

browser_ai/actions/interaction.py-135-139 (1)

135-139: Default mutable argument pattern: browser: BrowserContext = None.

Having browser default to None while the function body assumes it's a valid BrowserContext is misleading. If called without browser, line 141 will raise AttributeError.

Either make browser required or handle the None case:

  async def wait_for_url_change(
      contains_text: str = "",
      timeout_seconds: int = 10,
-     browser: BrowserContext = None
+     browser: BrowserContext
  ) -> ActionResult:
browser_ai_extension/browse_ai/src/sidepanel/SidePanel.tsx-505-514 (1)

505-514: Unexpected behavior: clicking main content area clears conversation state.

The onClick handler on the main content div dismisses the task header and clears conversation messages/intent when clicked anywhere in the content area. This seems unintentional and could frustrate users who accidentally click and lose their conversation.

Consider removing this behavior or making it more intentional (e.g., a dedicated "clear" button):

  <div
    ref={scrollRef}
    className="flex-1 overflow-y-auto ..."
-   onClick={() => {
-     setTaskHeaderDismissed(!taskStatus.is_running)
-     if (mode === 'conversation') {
-       setMessages([])
-       setIntent(null)
-     }
-   }}
  >
browser_ai_extension/browse_ai/src/services/VoiceConversation.ts-369-369 (1)

369-369: Regex cannot properly match emoji characters without unicode flag.

The regex on line 369 attempts to remove emojis but uses character classes that cannot match surrogate pairs (multi-byte emoji characters) without the u flag. This means some emojis may not be removed.

Add the unicode flag to properly handle emoji:

- .replace(/[✅🚀👋🎧🤔❓💡📝🤖🎙️🔊👂📤🚫]/g, '') // Remove common emojis
+ .replace(/[✅🚀👋🎧🤔❓💡📝🤖🎙️🔊👂📤🚫]/gu, '') // Remove common emojis

Based on static analysis hint from Biome.

browser_ai/actions/interaction.py-181-182 (1)

181-182: Bare except with pass silently swallows all errors.

This pattern hides potentially important errors and makes debugging difficult.

At minimum, catch a specific exception or log the error:

  try:
      locator = page.get_by_text(text, exact=False)
      if await locator.count() > 0:
          text_found = True
- except:
-     pass
+ except Exception as e:
+     logger.debug(f"get_by_text lookup failed: {e}")

Based on Ruff hints (E722, S110).

browser_ai/agent/service.py-76-87 (1)

76-87: Avoid mutable default argument for include_attributes list.

Mutable default arguments in Python are shared across all calls, which can cause unexpected behavior if the list is modified.

-        include_attributes: list[str] = [
-            "title",
-            "type",
-            "name",
-            "role",
-            "tabindex",
-            "aria-label",
-            "placeholder",
-            "value",
-            "alt",
-            "aria-expanded",
-        ],
+        include_attributes: list[str] | None = None,

Then initialize within __init__:

self.include_attributes = include_attributes if include_attributes is not None else [
    "title", "type", "name", "role", "tabindex", "aria-label",
    "placeholder", "value", "alt", "aria-expanded",
]
browser_ai_extension/browse_ai/src/utils/state.ts-59-70 (1)

59-70: Inconsistent error handling: loadCdpEndpoint doesn't check chrome.runtime.lastError.

Unlike loadTaskStatus and other load functions, loadCdpEndpoint doesn't check for chrome.runtime.lastError. This inconsistency could hide storage errors.

 export const loadCdpEndpoint = async (): Promise<string | null> => {
   return new Promise((resolve) => {
     chrome.storage.local.get([STATE_KEYS.CDP_ENDPOINT], (result) => {
-      if (result[STATE_KEYS.CDP_ENDPOINT]) {
+      if (chrome.runtime.lastError) {
+        console.error('[State] Failed to load CDP endpoint:', chrome.runtime.lastError)
+        resolve(null)
+      } else if (result[STATE_KEYS.CDP_ENDPOINT]) {
         console.log('[State] Loaded CDP endpoint:', result[STATE_KEYS.CDP_ENDPOINT])
         resolve(result[STATE_KEYS.CDP_ENDPOINT])
       } else {
         resolve(null)
       }
     })
   })
 }

Comment on lines +240 to +244
// Create notification window
const width = 500
const height = 400
const left = Math.round((screen.width - width) / 2)
const top = Math.round((screen.height - height) / 2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

screen object may be unavailable in service worker context.

Service workers don't have access to window or screen objects. This code may throw a ReferenceError when executed.

Use fallback dimensions or query the active window:

 async function handleShowNotification(
   request: ShowNotificationMessage,
   sendResponse: (response: any) => void,
 ) {
   try {
     const { notificationType, message, details, result } = request
     const timestamp = new Date().toISOString()

     // Create notification window
     const width = 500
     const height = 400
-    const left = Math.round((screen.width - width) / 2)
-    const top = Math.round((screen.height - height) / 2)
+    // Get current window to center the popup
+    const currentWindow = await chrome.windows.getCurrent()
+    const left = currentWindow.left !== undefined && currentWindow.width !== undefined
+      ? Math.round(currentWindow.left + (currentWindow.width - width) / 2)
+      : undefined
+    const top = currentWindow.top !== undefined && currentWindow.height !== undefined
+      ? Math.round(currentWindow.top + (currentWindow.height - height) / 2)
+      : undefined

Comment on lines +1 to +3
/// <reference types="vite/client" />

declare const __APP_VERSION__: string
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and examine vite configuration files
fd 'vite\.config\.(ts|js)' -x cat -n {}

Repository: Sathursan-S/Browser.AI

Length of output: 1033


🏁 Script executed:

# Search for all occurrences of __APP_VERSION__ in the codebase
rg -n '__APP_VERSION__' -B2 -A2

Repository: Sathursan-S/Browser.AI

Length of output: 289


🏁 Script executed:

# List the structure to understand the project layout
git ls-files | head -30

Repository: Sathursan-S/Browser.AI

Length of output: 865


__APP_VERSION__ is declared but not injected in the Vite configuration.

The ambient declaration correctly references Vite client types, but the global __APP_VERSION__ constant is not defined in vite.config.ts. Either inject this constant using Vite's define option in the configuration, or remove the declaration if it's not needed.

🤖 Prompt for AI Agents
In browser_ai_extension/browse_ai/src/global.d.ts lines 1-3, the ambient
constant __APP_VERSION__ is declared but not provided by Vite; either remove
this declaration if you don't need a compile-time constant, or add the constant
to vite.config.ts using the define option (e.g. define: { __APP_VERSION__:
JSON.stringify(<source>) }) so Vite injects a string value (commonly from
package.json or an env var) matching the declared string type.

Comment on lines +1 to +20
/**
* Voice Recognition Service
*
* Provides speech-to-text constructor() {
// Check for browser support
const SpeechRecognitionClass =
(window as any).SpeechRecognition ||
(window as any).webkitSpeechRecognition

if (SpeechRecognitionClass) {
this.isSupported = true
this.recognition = new SpeechRecognitionClass() as SpeechRecognition
console.log('✅ Voice Recognition API is supported and initialized')
} else {
console.warn('❌ Speech Recognition API not supported in this browser')
console.log('Available window properties:', Object.keys(window).filter(k => k.toLowerCase().includes('speech')))
}
}y using Web Speech API.
* Handles microphone input, continuous recognition, and interim results.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix malformed JSDoc comment containing code.

Lines 1-20 contain a corrupted JSDoc comment block with constructor code embedded inside it. This appears to be a merge error or paste mistake. The constructor is properly defined again at lines 81-93.

Apply this diff to fix the structure:

 /**
  * Voice Recognition Service
- * 
- * Provides speech-to-text  constructor() {
-    // Check for browser support
-    const SpeechRecognitionClass = 
-      (window as any).SpeechRecognition || 
-      (window as any).webkitSpeechRecognition
-
-    if (SpeechRecognitionClass) {
-      this.isSupported = true
-      this.recognition = new SpeechRecognitionClass() as SpeechRecognition
-      console.log('✅ Voice Recognition API is supported and initialized')
-    } else {
-      console.warn('❌ Speech Recognition API not supported in this browser')
-      console.log('Available window properties:', Object.keys(window).filter(k => k.toLowerCase().includes('speech')))
-    }
-  }y using Web Speech API.
+ *
+ * Provides speech-to-text functionality using Web Speech API.
  * Handles microphone input, continuous recognition, and interim results.
  */
🤖 Prompt for AI Agents
In browser_ai_extension/browse_ai/src/services/VoiceRecognition.ts around lines
1 to 20, the top JSDoc block is corrupted by embedded constructor code; remove
the constructor code from the JSDoc so the comment only contains valid
documentation and not executable code, then ensure the actual constructor
implementation remains intact (it's already correctly defined at lines ~81-93).
After editing, run a quick lint/typecheck to confirm no syntax errors introduced
and that the file still compiles.

Comment on lines +250 to +252
# Center goal text horizontally, place above step number
x_goal = (image.width - goal_width) // 2
y_goal = y_step - goal_height - padding * 4 # More space between step and goal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential NameError: y_step and padding may be undefined.

When display_step is False, the variables y_step and padding are never assigned, but line 252 references them:

y_goal = y_step - goal_height - padding * 4

Add fallback values or restructure the logic:

+ # Initialize defaults
+ y_step = image.height - margin - 10
+ padding = 20
+
  if display_step:
      # ... existing step number drawing code ...
+     y_step = image.height - margin - step_height - 10
+     padding = 20

  # Draw goal text (centered, bottom)
  ...
  y_goal = y_step - goal_height - padding * 4
🤖 Prompt for AI Agents
In browser_ai/agent/media.py around lines 250-252, the computation y_goal =
y_step - goal_height - padding * 4 can raise NameError because y_step and
padding may be undefined when display_step is False; ensure both variables are
always defined before use by providing sensible fallbacks or moving their
assignments outside the display_step branch. Specifically, define a default
padding (e.g., set padding = existing_padding_value or a small constant) before
any conditional, and compute y_step with a fallback (for example derive y_step
from image.height and step_height or set y_step = image.height - step_height -
padding) so y_goal can be calculated regardless of display_step; update logic so
that when display_step is False you still calculate y_goal using the fallback
values.

Comment on lines +1156 to +1162
# Create planner message history using full message history
planner_messages = [
PlannerPrompt(self.action_descriptions).get_system_message(),
*self.message_manager.get_messages()[
1:
], # Use full message history except the first
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

self.action_descriptions is undefined - will raise AttributeError.

The _run_planner method references self.action_descriptions on line 1158, but this attribute is never set in __init__ or elsewhere in the class. This will cause a runtime error when the planner is invoked.

Based on the MessageManager initialization (line 168), the action descriptions come from self.controller.registry.get_prompt_description(). Apply this fix:

         # Create planner message history using full message history
         planner_messages = [
-            PlannerPrompt(self.action_descriptions).get_system_message(),
+            PlannerPrompt(self.controller.registry.get_prompt_description()).get_system_message(),
             *self.message_manager.get_messages()[
             ],  # Use full message history except the first
         ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create planner message history using full message history
planner_messages = [
PlannerPrompt(self.action_descriptions).get_system_message(),
*self.message_manager.get_messages()[
1:
], # Use full message history except the first
]
# Create planner message history using full message history
planner_messages = [
PlannerPrompt(self.controller.registry.get_prompt_description()).get_system_message(),
*self.message_manager.get_messages()[
1:
], # Use full message history except the first
]
🤖 Prompt for AI Agents
In browser_ai/agent/service.py around lines 1156 to 1162, _run_planner
references self.action_descriptions which is not defined and will raise
AttributeError; set self.action_descriptions =
self.controller.registry.get_prompt_description() when the MessageManager is
initialized (around line 168) or in __init__ so the planner has the expected
descriptions available; ensure the assignment occurs before any call to
_run_planner and remove any duplicate lookups if MessageManager already stores
the same value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants