Skip to content

Conversation

@IsmaelMartinez
Copy link
Owner

Investigation for issue #1938 exploring approaches to publish camera, microphone, and in-call state to MQTT for home automation.

Key findings:

Recommended 2-phase approach:

  1. WebRTC foundation (immediate, solves [Feat]: Extended MQTT status fields #1938)
  2. Graph API enhancement (future, aligns with [Feat]: Strategic Initiative: Microsoft Graph API Integration - Enhance Teams for Linux with Official APIs #1832)

Investigation for issue #1938 exploring approaches to publish
camera, microphone, and in-call state to MQTT for home automation.

Key findings:
- WebRTC MediaStream monitoring recommended (proven pattern)
- Leverages existing disableAutogain.js technique
- Future enhancement with Graph API when #1832 is implemented
- Low maintenance, high stability using standard Web APIs

Recommended 2-phase approach:
1. WebRTC foundation (immediate, solves #1938)
2. Graph API enhancement (future, aligns with #1832)
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @IsmaelMartinez, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request presents a thorough investigation into enhancing the existing MQTT integration to provide more detailed Teams status information for home automation. It proposes a robust, two-phase solution: first, implementing WebRTC MediaStream monitoring for accurate camera, microphone, and in-call states, and second, a future enhancement leveraging the Microsoft Graph API for richer meeting context. The goal is to move beyond basic presence and enable more sophisticated automation based on actual device usage.

Highlights

  • Problem Addressed: The current MQTT integration only publishes basic Teams presence. This PR investigates solutions to provide more granular status, including camera on/off, microphone on/off, and in-call state, for enhanced home automation.
  • Recommended Approach (Phase 1): The primary recommendation is to implement WebRTC MediaStream monitoring. This involves intercepting getUserMedia() calls and listening to MediaStreamTrack events (like mute, unmute, ended) to accurately determine the real-time state of the camera and microphone. This approach leverages a proven pattern already used in disableAutogain.js.
  • Key Advantages: This WebRTC-based solution uses stable Web APIs, directly measures actual device usage, is immune to Teams UI changes, and works reliably with keyboard shortcuts, providing high stability and low maintenance.
  • Future Enhancement (Phase 2): A second phase is planned to enhance the MQTT payload with richer context by integrating with the Microsoft Graph API (once [Feat]: Strategic Initiative: Microsoft Graph API Integration - Enhance Teams for Linux with Official APIs #1832 is implemented). This would provide official presence status, calendar event details, and confidence scoring by cross-validating multiple data sources.
  • Rejected Approaches: Several alternatives were considered and rejected, including network traffic analysis (too complex, encrypted), native OS integration (platform-specific, overengineered), and keyboard shortcut interception (fragile, can desync).
  • Implementation Plan: A detailed two-phase implementation plan is outlined, including the creation of a new module (mqttExtendedStatus.js), IPC communication, configuration options for MQTT topics, and a comprehensive testing strategy.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a very well-written and thorough investigation document for adding extended MQTT status. The proposed two-phase approach using WebRTC and then enhancing with the Graph API is solid. The document clearly outlines the problem, different approaches, and a detailed implementation plan.

My review includes a few suggestions on the technical design, mainly focusing on the WebRTC monitoring logic and the example IPC handler implementation, to make the solution more robust and efficient. Overall, great work on this research.

Added critical section explaining interaction with existing
screen sharing code that disables audio to prevent echo/feedback.

Key findings:
- Teams creates SEPARATE streams for calls vs screen sharing
- Screen sharing streams have audio disabled (injectedScreenSharing.js)
- Regular call streams (camera + mic) are NOT affected
- Solution: Use same detection logic to filter screen shares

Implementation approach:
- Detect screen share streams (chromeMediaSource === "desktop")
- Skip monitoring screen share streams (no audio tracks)
- Only monitor regular call streams for mic/camera state
- Both interceptors coexist without interference

Updated code examples to include isScreenShare filtering.
Addresses concerns about issues #1871 and #1896.
@IsmaelMartinez IsmaelMartinez changed the title Explore solutions for GitHub issue 1938 [Research]: Explore solutions for GitHub issue 1938 Nov 12, 2025
@IsmaelMartinez IsmaelMartinez moved this to In Review in 2.x Nov 12, 2025
@IsmaelMartinez IsmaelMartinez moved this from In Review to In Progress in 2.x Nov 12, 2025
@IsmaelMartinez IsmaelMartinez marked this pull request as draft November 12, 2025 07:02
@IsmaelMartinez IsmaelMartinez moved this from In Progress to Todo in 2.x Nov 12, 2025
CRITICAL FIX: MediaStreamTrack.enabled property changes do NOT fire
mute/unmute events. This is the standard way web apps control track
state programmatically (e.g., UI button clicks, keyboard shortcuts).

Problem:
- Teams UI buttons likely use track.enabled = false/true
- This does NOT fire mute/unmute events
- Event-only monitoring would miss most state changes

Solution - Hybrid Monitoring Approach:
1. Event listeners (mute/unmute/ended) - immediate response
2. Poll track.enabled property at 500ms intervals - catch all changes
3. Cleanup intervals when track ends - prevent memory leaks

Why 500ms polling:
- Fast enough for human perception
- Low overhead (~4-6 checks/second for typical call)
- Negligible CPU impact

Updated investigation document:
- Added "Critical: track.enabled vs mute/unmute Events" section
- Updated code examples with hybrid monitoring
- Added monitorTrackEnabled() helper function
- Updated implementation checklist
- Added to Open Questions with recommendation

Credit: Issue identified by Gemini Code review
Address two critical implementation details for MQTT publishing:

1. MQTT Payload Format:
   - MQTT payloads are strings/buffers, not JavaScript primitives
   - Boolean values must be explicitly converted with String()
   - Home Assistant expects "true"/"false" strings, not booleans
   - Updated all examples to show String(data.camera) etc.

2. Publishing Efficiency:
   - Individual topic publishes are independent operations
   - Use Promise.all() for parallel publishing (not sequential)
   - Sequential: 200-400ms total latency (4x ~50-100ms)
   - Parallel: 50-100ms total (all complete simultaneously)
   - 3-4x performance improvement

Changes:
- Updated main process IPC handler with String() conversion
- Changed sequential await to Promise.all([...])
- Added new "MQTT Payload Format and Publishing Efficiency" section
- Updated MQTT topics section to clarify string format
- Updated implementation checklist with both requirements
- Added detailed performance comparison examples

Credit: Issues identified by Gemini Code review
Focused investigation on what user ACTUALLY requested:
- Camera on/off (Red LED)
- Microphone on/off (Orange LED)
- In-call state (Yellow LED)

REMOVED (over-engineering):
- "Approach 2: Graph API Enhancement" - user didn't ask for this
- "Approach 3: DOM Fallback" - unnecessary complexity
- Phase 1 and Phase 2 - just one simple implementation
- Full-state JSON topic - user just needs three boolean topics
- Excessive publishing efficiency discussion
- Multiple "Open Questions" - simplified to essential details

KEPT (essential):
- WebRTC stream monitoring (single, focused solution)
- Hybrid track monitoring (events + track.enabled polling)
- Screen sharing filtering (necessary for existing code)
- MQTT string conversion (necessary)
- Three simple topics: camera, microphone, in-call
- Implementation checklist
- Testing steps

ADDED:
- "Future Expansion Opportunities" section (brief, at end)
- Clear path to add Graph API later WITHOUT refactoring
- Home Assistant example for user's use case

Result: Document went from ~400 lines with 3 approaches and 2 phases
to ~240 lines with 1 clear solution. Delivers exactly what user wants,
nothing more, with expansion path for future if needed.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 15, 2025

📦 PR Build Artifacts

Build successful! Download artifacts:

View workflow run

Created testing spike to verify critical assumptions before implementation:

WHAT WE'RE TESTING:
1. getUserMedia interception works alongside injectedScreenSharing.js
2. Teams uses track.enabled (not mute events) for UI buttons
3. Screen sharing detection logic correctly identifies streams
4. Track state changes are detectable via polling

FILES ADDED:
- app/browser/tools/mqttExtendedStatusSpike.js
  - Temporary verification tool (DO NOT USE IN PRODUCTION)
  - Intercepts getUserMedia calls
  - Logs all track state changes (enabled, muted, readyState)
  - Polls track.enabled every 500ms to catch UI button clicks
  - Tests screen sharing detection logic

- MQTT_EXTENDED_STATUS_SPIKE_TESTING.md
  - Comprehensive testing guide with 7 test scenarios
  - Expected outputs for each test
  - Results checklist
  - Decision framework for next steps

- Updated app/browser/preload.js to load spike module

HOW TO USE:
1. Add "mqttExtendedStatusSpike": true to config.json
2. Run npm start
3. Open DevTools console
4. Follow testing guide in MQTT_EXTENDED_STATUS_SPIKE_TESTING.md
5. Join test call and observe [MQTT_SPIKE] logs

CRITICAL TEST:
Test 3 (Toggle Microphone) will tell us if Teams uses:
- track.enabled = false (our assumption) ← Will see "PROPERTY CHANGE"
- mute/unmute events (alternative) ← Will see "EVENT"

This determines if our hybrid approach is correct.

Next: Run spike tests, document results, then implement full solution
Replace .forEach() with for...of loops to match project code style.

Changes:
- stream.getVideoTracks().forEach() → for...of videoTracks.entries()
- stream.getAudioTracks().forEach() → for...of audioTracks.entries()

Maintains index for track naming (camera-0, microphone-0) while
following project's ESLint preferences.
Comprehensive analysis of how extended status integrates with existing
MQTT architecture using YAGNI and KISS principles.

KEY FINDINGS:

Current Architecture:
- Browser: mqttStatusMonitor.js → IPC user-status-changed
- Main: userStatusChangedHandler() → mqttClient.publishStatus()
- MQTTClient has single method publishStatus() for presence only

RECOMMENDED APPROACH (KISS + YAGNI):

1. Add Generic publish() Method:
   - publish(topic, payload, options)
   - Supports strings, objects (auto-stringify)
   - Deduplication with optional key
   - Retain and QoS options
   - Backward compatible (existing publishStatus() unchanged)

2. Separate IPC Channel:
   - Keep 'user-status-changed' for presence
   - Add 'mqtt-extended-status-changed' for camera/mic/call
   - Clear separation of concerns (no conditional logic)

3. Nested Config Structure:
   - mqtt.presence { enabled, topic, checkInterval }
   - mqtt.extendedStatus { enabled, topics { camera, microphone, inCall } }
   - Backward compatible with migration

REJECTED APPROACHES:

❌ Specialized methods (publishCameraState, etc) - method explosion
❌ Event publisher/adapter pattern - over-engineered
❌ Flat config - too messy
❌ Reused IPC channel - mixed concerns

REQUIRED SPIKES:

Spike 1: Generic publish() method (verify no breaking changes)
Spike 2: IPC integration pattern (verify messages reach broker)
Spike 3: Config backward compatibility (verify migration works)

Next: Run 3 integration spikes before implementing production code
Changed config structure from generic "extendedStatus" to clear
semantic categories: camera, microphone, call.

PROBLEM WITH "EXTENDED":
- "Extended" is a technical term, not a semantic category
- Just means "stuff we added later"
- Unclear what "extended" contains
- Hard to understand at a glance

NEW SEMANTIC APPROACH:
{
  "mqtt": {
    "presence": { enabled, topic },    // Availability status
    "camera": { enabled, topic },      // Video device
    "microphone": { enabled, topic },  // Audio device
    "call": { enabled, topic }         // Call state
  }
}

BENEFITS:
✅ Clear what each setting controls (no guessing)
✅ Each independently configurable
✅ Self-documenting (camera = camera, not "extended field 1")
✅ Flat structure (simpler than nested groupings)
✅ Easy to add new categories (screen, recording, etc.)

IPC CHANNEL:
- Renamed: mqtt-extended-status-changed → mqtt-media-status-changed
- Rationale: Single IPC for all stream data (camera/mic/call detected together)
- Handler selectively publishes based on config

FILES UPDATED:
- MQTT_INTEGRATION_ARCHITECTURE_ANALYSIS.md
- docs-site/docs/development/research/mqtt-extended-status-investigation.md

This is more KISS-compliant: each thing is named for what it is.
Document how semantic category pattern scales to future use cases
(notifications, calendar, messages, recording, reactions, etc.)

FUTURE CATEGORIES ANALYZED:

Messages & Notifications:
- messageCount, newMessage, mentions
- Detection: DOM title monitoring (already exists!)
- Topics: teams/messages/unread, teams/messages/mentions

Calendar & Meetings:
- nextMeeting, meetingStarting, meetingDuration
- Detection: Graph API (wait for #1832) or DOM
- Topics: teams/calendar/next (with JSON payload)

Screen Sharing:
- screenSharing
- Detection: IPC events (already implemented!)
- Topics: teams/screen-sharing

Reactions & Engagement:
- handRaised, reactions
- Detection: DOM monitoring
- Topics: teams/hand-raised, teams/reactions/latest

Recording & Privacy:
- recording, transcription
- Detection: DOM (recording indicator)
- Topics: teams/recording (privacy use case)

Participants:
- participantCount
- Detection: DOM roster panel
- Topics: teams/participants/count

PATTERN SCALES PERFECTLY:

✅ Each category = what it represents (not grouped by tech)
✅ Independently configurable (enable what you need)
✅ Self-documenting (camera = camera, not "media field 1")
✅ Privacy-friendly (opt-in per category)
✅ Generic publish() supports all future categories

IMPLEMENTATION PRIORITY:

High: Screen sharing, message count (easy - already detected)
Medium: Calendar (wait for Graph API #1832), recording
Low: Reactions, participant count (wait for user requests)

DETECTION STRATEGY:

Prefer stable APIs (WebRTC, IPC, Graph) over fragile DOM scraping.
Add features ONLY when users request (YAGNI).

This validates our semantic category decision - it scales!
Consolidated MQTT semantic categories expansion roadmap into
the main mqtt-extended-status-investigation.md research document.

CHANGES:
- Removed separate MQTT_SEMANTIC_CATEGORIES_EXPANSION.md file
- Expanded "Future Expansion Opportunities" section with comprehensive details:
  - Messages & Notifications (messageCount, newMessage, mentions)
  - Calendar & Meetings (nextMeeting, meetingStarting)
  - Screen Sharing (screenSharing)
  - Recording & Privacy (recording, transcription)
  - Reactions & Engagement (handRaised, reactions)
  - Participant Count (participantCount)

- Added detection strategy table showing fragility levels
- Added implementation priority guidance (YAGNI)
- Added rationale for semantic categories scaling

BENEFITS:
✅ All research in one place (easier to find)
✅ Comprehensive future planning (but YAGNI - don't build yet)
✅ Clear implementation priorities (stable APIs first)
✅ Validates semantic category decision (it scales!)

Keep all research for a feature together, not scattered.
@sonarqubecloud
Copy link

@IsmaelMartinez IsmaelMartinez moved this from Todo to In Progress in 2.x Nov 16, 2025
@IsmaelMartinez IsmaelMartinez changed the title [Research]: Explore solutions for GitHub issue 1938 [Research]: Explore solutions MQTT Publish camera, microphone and in-call state Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants