-
-
Notifications
You must be signed in to change notification settings - Fork 301
[Research]: Explore solutions MQTT Publish camera, microphone and in-call state #1941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Research]: Explore solutions MQTT Publish camera, microphone and in-call state #1941
Conversation
Investigation for issue #1938 exploring approaches to publish camera, microphone, and in-call state to MQTT for home automation. Key findings: - WebRTC MediaStream monitoring recommended (proven pattern) - Leverages existing disableAutogain.js technique - Future enhancement with Graph API when #1832 is implemented - Low maintenance, high stability using standard Web APIs Recommended 2-phase approach: 1. WebRTC foundation (immediate, solves #1938) 2. Graph API enhancement (future, aligns with #1832)
Summary of ChangesHello @IsmaelMartinez, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request presents a thorough investigation into enhancing the existing MQTT integration to provide more detailed Teams status information for home automation. It proposes a robust, two-phase solution: first, implementing WebRTC MediaStream monitoring for accurate camera, microphone, and in-call states, and second, a future enhancement leveraging the Microsoft Graph API for richer meeting context. The goal is to move beyond basic presence and enable more sophisticated automation based on actual device usage. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a very well-written and thorough investigation document for adding extended MQTT status. The proposed two-phase approach using WebRTC and then enhancing with the Graph API is solid. The document clearly outlines the problem, different approaches, and a detailed implementation plan.
My review includes a few suggestions on the technical design, mainly focusing on the WebRTC monitoring logic and the example IPC handler implementation, to make the solution more robust and efficient. Overall, great work on this research.
docs-site/docs/development/research/mqtt-extended-status-investigation.md
Show resolved
Hide resolved
docs-site/docs/development/research/mqtt-extended-status-investigation.md
Outdated
Show resolved
Hide resolved
docs-site/docs/development/research/mqtt-extended-status-investigation.md
Outdated
Show resolved
Hide resolved
Added critical section explaining interaction with existing screen sharing code that disables audio to prevent echo/feedback. Key findings: - Teams creates SEPARATE streams for calls vs screen sharing - Screen sharing streams have audio disabled (injectedScreenSharing.js) - Regular call streams (camera + mic) are NOT affected - Solution: Use same detection logic to filter screen shares Implementation approach: - Detect screen share streams (chromeMediaSource === "desktop") - Skip monitoring screen share streams (no audio tracks) - Only monitor regular call streams for mic/camera state - Both interceptors coexist without interference Updated code examples to include isScreenShare filtering. Addresses concerns about issues #1871 and #1896.
…ue-1938-011CV3Pcndc5ZDDaQPrHPAeh
CRITICAL FIX: MediaStreamTrack.enabled property changes do NOT fire mute/unmute events. This is the standard way web apps control track state programmatically (e.g., UI button clicks, keyboard shortcuts). Problem: - Teams UI buttons likely use track.enabled = false/true - This does NOT fire mute/unmute events - Event-only monitoring would miss most state changes Solution - Hybrid Monitoring Approach: 1. Event listeners (mute/unmute/ended) - immediate response 2. Poll track.enabled property at 500ms intervals - catch all changes 3. Cleanup intervals when track ends - prevent memory leaks Why 500ms polling: - Fast enough for human perception - Low overhead (~4-6 checks/second for typical call) - Negligible CPU impact Updated investigation document: - Added "Critical: track.enabled vs mute/unmute Events" section - Updated code examples with hybrid monitoring - Added monitorTrackEnabled() helper function - Updated implementation checklist - Added to Open Questions with recommendation Credit: Issue identified by Gemini Code review
Address two critical implementation details for MQTT publishing: 1. MQTT Payload Format: - MQTT payloads are strings/buffers, not JavaScript primitives - Boolean values must be explicitly converted with String() - Home Assistant expects "true"/"false" strings, not booleans - Updated all examples to show String(data.camera) etc. 2. Publishing Efficiency: - Individual topic publishes are independent operations - Use Promise.all() for parallel publishing (not sequential) - Sequential: 200-400ms total latency (4x ~50-100ms) - Parallel: 50-100ms total (all complete simultaneously) - 3-4x performance improvement Changes: - Updated main process IPC handler with String() conversion - Changed sequential await to Promise.all([...]) - Added new "MQTT Payload Format and Publishing Efficiency" section - Updated MQTT topics section to clarify string format - Updated implementation checklist with both requirements - Added detailed performance comparison examples Credit: Issues identified by Gemini Code review
…ue-1938-011CV3Pcndc5ZDDaQPrHPAeh
Focused investigation on what user ACTUALLY requested: - Camera on/off (Red LED) - Microphone on/off (Orange LED) - In-call state (Yellow LED) REMOVED (over-engineering): - "Approach 2: Graph API Enhancement" - user didn't ask for this - "Approach 3: DOM Fallback" - unnecessary complexity - Phase 1 and Phase 2 - just one simple implementation - Full-state JSON topic - user just needs three boolean topics - Excessive publishing efficiency discussion - Multiple "Open Questions" - simplified to essential details KEPT (essential): - WebRTC stream monitoring (single, focused solution) - Hybrid track monitoring (events + track.enabled polling) - Screen sharing filtering (necessary for existing code) - MQTT string conversion (necessary) - Three simple topics: camera, microphone, in-call - Implementation checklist - Testing steps ADDED: - "Future Expansion Opportunities" section (brief, at end) - Clear path to add Graph API later WITHOUT refactoring - Home Assistant example for user's use case Result: Document went from ~400 lines with 3 approaches and 2 phases to ~240 lines with 1 clear solution. Delivers exactly what user wants, nothing more, with expansion path for future if needed.
📦 PR Build Artifacts✅ Build successful! Download artifacts:
|
Created testing spike to verify critical assumptions before implementation: WHAT WE'RE TESTING: 1. getUserMedia interception works alongside injectedScreenSharing.js 2. Teams uses track.enabled (not mute events) for UI buttons 3. Screen sharing detection logic correctly identifies streams 4. Track state changes are detectable via polling FILES ADDED: - app/browser/tools/mqttExtendedStatusSpike.js - Temporary verification tool (DO NOT USE IN PRODUCTION) - Intercepts getUserMedia calls - Logs all track state changes (enabled, muted, readyState) - Polls track.enabled every 500ms to catch UI button clicks - Tests screen sharing detection logic - MQTT_EXTENDED_STATUS_SPIKE_TESTING.md - Comprehensive testing guide with 7 test scenarios - Expected outputs for each test - Results checklist - Decision framework for next steps - Updated app/browser/preload.js to load spike module HOW TO USE: 1. Add "mqttExtendedStatusSpike": true to config.json 2. Run npm start 3. Open DevTools console 4. Follow testing guide in MQTT_EXTENDED_STATUS_SPIKE_TESTING.md 5. Join test call and observe [MQTT_SPIKE] logs CRITICAL TEST: Test 3 (Toggle Microphone) will tell us if Teams uses: - track.enabled = false (our assumption) ← Will see "PROPERTY CHANGE" - mute/unmute events (alternative) ← Will see "EVENT" This determines if our hybrid approach is correct. Next: Run spike tests, document results, then implement full solution
…ue-1938-011CV3Pcndc5ZDDaQPrHPAeh
Replace .forEach() with for...of loops to match project code style. Changes: - stream.getVideoTracks().forEach() → for...of videoTracks.entries() - stream.getAudioTracks().forEach() → for...of audioTracks.entries() Maintains index for track naming (camera-0, microphone-0) while following project's ESLint preferences.
Comprehensive analysis of how extended status integrates with existing
MQTT architecture using YAGNI and KISS principles.
KEY FINDINGS:
Current Architecture:
- Browser: mqttStatusMonitor.js → IPC user-status-changed
- Main: userStatusChangedHandler() → mqttClient.publishStatus()
- MQTTClient has single method publishStatus() for presence only
RECOMMENDED APPROACH (KISS + YAGNI):
1. Add Generic publish() Method:
- publish(topic, payload, options)
- Supports strings, objects (auto-stringify)
- Deduplication with optional key
- Retain and QoS options
- Backward compatible (existing publishStatus() unchanged)
2. Separate IPC Channel:
- Keep 'user-status-changed' for presence
- Add 'mqtt-extended-status-changed' for camera/mic/call
- Clear separation of concerns (no conditional logic)
3. Nested Config Structure:
- mqtt.presence { enabled, topic, checkInterval }
- mqtt.extendedStatus { enabled, topics { camera, microphone, inCall } }
- Backward compatible with migration
REJECTED APPROACHES:
❌ Specialized methods (publishCameraState, etc) - method explosion
❌ Event publisher/adapter pattern - over-engineered
❌ Flat config - too messy
❌ Reused IPC channel - mixed concerns
REQUIRED SPIKES:
Spike 1: Generic publish() method (verify no breaking changes)
Spike 2: IPC integration pattern (verify messages reach broker)
Spike 3: Config backward compatibility (verify migration works)
Next: Run 3 integration spikes before implementing production code
Changed config structure from generic "extendedStatus" to clear
semantic categories: camera, microphone, call.
PROBLEM WITH "EXTENDED":
- "Extended" is a technical term, not a semantic category
- Just means "stuff we added later"
- Unclear what "extended" contains
- Hard to understand at a glance
NEW SEMANTIC APPROACH:
{
"mqtt": {
"presence": { enabled, topic }, // Availability status
"camera": { enabled, topic }, // Video device
"microphone": { enabled, topic }, // Audio device
"call": { enabled, topic } // Call state
}
}
BENEFITS:
✅ Clear what each setting controls (no guessing)
✅ Each independently configurable
✅ Self-documenting (camera = camera, not "extended field 1")
✅ Flat structure (simpler than nested groupings)
✅ Easy to add new categories (screen, recording, etc.)
IPC CHANNEL:
- Renamed: mqtt-extended-status-changed → mqtt-media-status-changed
- Rationale: Single IPC for all stream data (camera/mic/call detected together)
- Handler selectively publishes based on config
FILES UPDATED:
- MQTT_INTEGRATION_ARCHITECTURE_ANALYSIS.md
- docs-site/docs/development/research/mqtt-extended-status-investigation.md
This is more KISS-compliant: each thing is named for what it is.
Document how semantic category pattern scales to future use cases (notifications, calendar, messages, recording, reactions, etc.) FUTURE CATEGORIES ANALYZED: Messages & Notifications: - messageCount, newMessage, mentions - Detection: DOM title monitoring (already exists!) - Topics: teams/messages/unread, teams/messages/mentions Calendar & Meetings: - nextMeeting, meetingStarting, meetingDuration - Detection: Graph API (wait for #1832) or DOM - Topics: teams/calendar/next (with JSON payload) Screen Sharing: - screenSharing - Detection: IPC events (already implemented!) - Topics: teams/screen-sharing Reactions & Engagement: - handRaised, reactions - Detection: DOM monitoring - Topics: teams/hand-raised, teams/reactions/latest Recording & Privacy: - recording, transcription - Detection: DOM (recording indicator) - Topics: teams/recording (privacy use case) Participants: - participantCount - Detection: DOM roster panel - Topics: teams/participants/count PATTERN SCALES PERFECTLY: ✅ Each category = what it represents (not grouped by tech) ✅ Independently configurable (enable what you need) ✅ Self-documenting (camera = camera, not "media field 1") ✅ Privacy-friendly (opt-in per category) ✅ Generic publish() supports all future categories IMPLEMENTATION PRIORITY: High: Screen sharing, message count (easy - already detected) Medium: Calendar (wait for Graph API #1832), recording Low: Reactions, participant count (wait for user requests) DETECTION STRATEGY: Prefer stable APIs (WebRTC, IPC, Graph) over fragile DOM scraping. Add features ONLY when users request (YAGNI). This validates our semantic category decision - it scales!
Consolidated MQTT semantic categories expansion roadmap into the main mqtt-extended-status-investigation.md research document. CHANGES: - Removed separate MQTT_SEMANTIC_CATEGORIES_EXPANSION.md file - Expanded "Future Expansion Opportunities" section with comprehensive details: - Messages & Notifications (messageCount, newMessage, mentions) - Calendar & Meetings (nextMeeting, meetingStarting) - Screen Sharing (screenSharing) - Recording & Privacy (recording, transcription) - Reactions & Engagement (handRaised, reactions) - Participant Count (participantCount) - Added detection strategy table showing fragility levels - Added implementation priority guidance (YAGNI) - Added rationale for semantic categories scaling BENEFITS: ✅ All research in one place (easier to find) ✅ Comprehensive future planning (but YAGNI - don't build yet) ✅ Clear implementation priorities (stable APIs first) ✅ Validates semantic category decision (it scales!) Keep all research for a feature together, not scattered.
|



Investigation for issue #1938 exploring approaches to publish camera, microphone, and in-call state to MQTT for home automation.
Key findings:
Recommended 2-phase approach: