Skip to content

[MNNChat:Feature] Implement real-time vision capabilities for interactive voice chat. Enable "ChatGPT-like" real-time visual dialogue by sending captured frames to LLM.#4263

Merged
wangzhaode merged 3 commits intoalibaba:masterfrom
JedLee6:jedlee/ft/master_260316
Mar 16, 2026

Conversation

@JedLee6
Copy link
Contributor

@JedLee6 JedLee6 commented Mar 15, 2026

😃 Hi, @wangzhaode @Juude . Could you please review and merge the following Pull Request at your convenience? Thanks! This PR Implement real-time vision capabilities for interactive voice chat. Enable "ChatGPT-like" real-time visual dialogue by sending captured frames to LLM.

Screenshot Video Demo
video6174455438680005488.mp4

Key Changes:

  • Support live camera preview with front and back camera switching.
  • Integrate CameraX for high-performance, low-latency image capture.
  • Optimize image processing by combining scaling and rotation into a single operation.
  • Enable "ChatGPT-like" real-time visual dialogue by sending captured frames to LLM.
  • Add comprehensive Javadoc and documentation for all vision and image utility logic.
  • Full-Duplex Interruption: Keeps ASR active during AI responses, allowing users to interrupt the AI by simply speaking.
  • Auto-Mute Mode: A software-based fallback for echo cancellation. It automatically toggles the microphone state based on the conversation flow.

JedLee6 added 2 commits March 15, 2026 12:16
- Keep ASR recording active during LLM generation and greeting playback.
- Add speech detection listener to interrupt AI output when user starts speaking.
- Improve responsiveness by allowing users to skip/interrupt AI responses.
…tion function.

- Support software-based echo cancellation (Auto-Mute mode)
- Automatically mute mic when AI starts speaking/generating
- Automatically unmute mic when AI finishes speaking or is interrupted
- Maintain full-duplex interruption support in hardware AEC mode
- Refine code comments to clarify speech interruption and ASR state logic
@JedLee6 JedLee6 changed the title [MNNChat:BugFix] Fix the speech interruption (duplex) function and auto-mute (software AEC) and speech interruption function. [MNNChat:BugFix] Fix the speech interruption (duplex) function and auto-mute (software AEC) function. Mar 15, 2026
@wangzhaode wangzhaode requested a review from Juude March 15, 2026 06:16
…tive voice chat.

- Support live camera preview with front and back camera switching.
- Integrate CameraX for high-performance, low-latency image capture.
- Optimize image processing by combining scaling and rotation into a single operation.
- Enable "ChatGPT-like" real-time visual dialogue by sending captured frames to LLM.
- Add comprehensive Javadoc and documentation for all vision and image utility logic.
@JedLee6 JedLee6 force-pushed the jedlee/ft/master_260316 branch from 81b2409 to 47cb56d Compare March 15, 2026 07:06
@JedLee6 JedLee6 changed the title [MNNChat:BugFix] Fix the speech interruption (duplex) function and auto-mute (software AEC) function. [MNNChat:Feature] Implement real-time vision capabilities for interactive voice chat. Mar 15, 2026
@JedLee6 JedLee6 changed the title [MNNChat:Feature] Implement real-time vision capabilities for interactive voice chat. [MNNChat:Feature] Implement real-time vision capabilities for interactive voice chat. Enable "ChatGPT-like" real-time visual dialogue by sending captured frames to LLM. Mar 15, 2026
Juude added a commit to Juude/MNN that referenced this pull request Mar 16, 2026
…moke test

Cherry-picked from pr-4263 (commit 47cb56d) with additional fixes:
- CameraX integration for live camera preview in VoiceChatFragment
- Vision mode: capture and send photos during voice chat
- Camera toggle button (on/off) with front/back switch support
- ImageUtils for image scaling and rotation optimization
- VoiceChatPresenter: add muteMicrophone() helper method
- Smoke test: 19_regress_vision_chat_ui.sh for E2E verification

Tested on device 1b4a0523:
- All 9/9 smoke test checks PASS
- Camera preview, toggle, switch all functional
- Echo cancellation mode preserved

🤖 Generated with [Qoder][https://qoder.com]
@wangzhaode wangzhaode merged commit 95b251b into alibaba:master Mar 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants