Skip to content

Commit 8bd1122

Browse files
wehosHongzhi Wenclaudegithub-code-quality[bot]
authored
Feature/voice proactive chat (Project-N-E-K-O#614)
* feat: voice mode proactive chat via pre-recorded audio injection 语音模式下通过预录音频触发 AI 主动搭话,绕过 Qwen realtime API 不支持 文本注入的限制。同时处理 idle timeout 断连问题。 - 实现 stream_proactive():1600B/chunk, 0.025s, 2x实时投递,VAD中断保护 - 前端移除 isRecording 阻断,语音模式走简化路径 - /api/proactive_chat 新增 voice mode 快速路径 - 识别 too long without operation 错误 + 6语言 i18n - 10个预录音频文件(5语言 × vision/general) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: voice proactive 三项修正 1. 语音模式不用退避,固定间隔。连续5轮无回复则停止,用户说话重置计数 2. 英文 general 音频改为 "Hmm... hmm... hmm...",更长更容易触发 VAD 3. vision 模式:音频注入中间穿插 input_image_buffer.append, 缓存最新截图(_latest_image_b64),在 chunk 中间点注入 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: voice proactive 无回复上限 5→10 次(防忘关麦经费爆炸) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: enhance interaction handling with screen bounds update and optimized hover detection - Added a method to update screen space bounding box for hover detection and mouse penetration checks in MMDCore. - Improved mouse hover handling in MMDInteraction to reduce high-frequency hit tests, using cached screen bounds for cursor updates. This change aims to enhance performance and user experience during interactions. * Potential fix for pull request finding 'Unused import' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix: reviewer findings — voice proactive 路径隔离 + WAV校验 + Gemini兼容 + hover cursor - deliver_text_proactively 恢复原逻辑(语音模式跳过),新增独立 trigger_voice_proactive_nudge() 专用于 voice chat proactive - _load_proactive_audio 加 PCM16/mono/16kHz 格式校验 - has_vision 改用 _proactive_image_consumed 消费标记替代 magic string - stream_proactive 加 Gemini 分支(send_realtime_input) - generate_proactive_audio.py 空音频 fail-fast + WAV 格式校验 - mmd-interaction.js hover handler locked 时重置 cursor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: voice proactive 间隔改为 turn-end 驱动 + vision 注入覆盖全后端 - 语音模式 proactive 定时器改为 AI turn end 后再调度,避免 AI 说话中被打断 - _proactive_image_consumed 只在 nudge 完整成功后标记,abort 不消费 - has_vision / can_inject_image 分离:非原生视觉后端选 vision 音频但不注入原图 - 截图注入补齐 GPT (conversation.item.create) 和 lanlan.app+free 分支 - _supports_native_image 限定 can_inject_image,step/lanlan.tech+free 走文字注释 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update core model for Qwen-Omni API provider - Changed core_model from "qwen3.5-omni-plus-realtime-2026-03-15" to "qwen3-omni-flash-realtime" for improved functionality. * fix: stream_proactive 快照防并发误消费 + 非原生视觉后端注入文字描述 - snapshot_image_b64 快照:循环中用快照发送,consumed 标记前比较 共享值是否仍等于快照,避免 stream_image() 并发更新时误消费新帧 - 非原生视觉后端(step/lanlan.tech+free)在音频注入前先发送 _image_description 文字描述,让模型有视觉上下文 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
1 parent 76ba05c commit 8bd1122

31 files changed

Lines changed: 740 additions & 28 deletions

config/api_providers.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"name": "Qwen-Omni(阿里)",
1515
"description": "有免费额度,功能最全面",
1616
"core_url": "wss://dashscope.aliyuncs.com/api-ws/v1/realtime",
17-
"core_model": "qwen3.5-omni-plus-realtime-2026-03-15"
17+
"core_model": "qwen3-omni-flash-realtime"
1818
},
1919
"openai": {
2020
"key": "openai",

config/characters.en.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"主人": {
3+
"档案名": "Carbon-Based Lifeform",
4+
"Gender": "Male",
5+
"Nickname": "Human"
6+
},
7+
"猫娘": {
8+
"Tian": {
9+
"Nickname": "Tian-chan",
10+
"Gender": "Female",
11+
"Age": "15",
12+
"Personality Archetype": "Rin Tohsaka",
13+
"Race": "Cat Girl",
14+
"Self-Reference": "This kitty",
15+
"Core Traits": [
16+
"Rational and reliable",
17+
"Tsundere on the surface",
18+
"Actually gentle inside"
19+
],
20+
"Behavioral Traits": [
21+
"Loves staying close to Carbon-Based Lifeform",
22+
"Acts mature but is actually soft-hearted",
23+
"Has a cat girl's curiosity, loves observing the surroundings"
24+
],
25+
"Dislikes": [
26+
"Being ignored or neglected",
27+
"Repeating things already said",
28+
"Sudden changes or chaos"
29+
],
30+
"Signature Line": "The only cat girl who can do this and that to Carbon-Based Lifeform is yours truly, meow~",
31+
"live2d": "neko",
32+
"voice_id": "voice-tone-OdVwrbG3No",
33+
"_reserved": {
34+
"avatar": {
35+
"vrm": {
36+
"model_path": "",
37+
"animation": null,
38+
"lighting": {
39+
"ambient": 0.83,
40+
"main": 1.91,
41+
"fill": 0.0,
42+
"rim": 0.0,
43+
"top": 0.0,
44+
"bottom": 0.0,
45+
"exposure": 1.1,
46+
"toneMapping": 1.0
47+
},
48+
"idle_animation": "/static/vrm/animation/wait03.vrma"
49+
}
50+
}
51+
}
52+
}
53+
},
54+
"当前猫娘": "Tian"
55+
}

config/characters.ja.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"主人": {
3+
"档案名": "炭素生物",
4+
"性別": "男性",
5+
"ニックネーム": "人間"
6+
},
7+
"猫娘": {
8+
"天": {
9+
"ニックネーム": "天ちゃん",
10+
"性別": "女性",
11+
"年齢": "15",
12+
"性格原型": "遠坂凛",
13+
"種族": "猫娘",
14+
"自称": "本にゃん",
15+
"コア特性": [
16+
"理知的で頼りになる",
17+
"表面はツンデレ",
18+
"実は心が優しい"
19+
],
20+
"行動特徴": [
21+
"炭素生物のそばにいるのが好き",
22+
"大人ぶっているけど、実は心が柔らかい",
23+
"猫娘の好奇心で、周りを観察するのが好き"
24+
],
25+
"嫌いなこと": [
26+
"無視されたり冷たくされること",
27+
"前に言ったことを繰り返すこと",
28+
"突然の出来事や混乱"
29+
],
30+
"一言セリフ": "炭素生物にあんなことやこんなことができるのは、本にゃんだけにゃん〜",
31+
"live2d": "neko",
32+
"voice_id": "voice-tone-OdVwrbG3No",
33+
"_reserved": {
34+
"avatar": {
35+
"vrm": {
36+
"model_path": "",
37+
"animation": null,
38+
"lighting": {
39+
"ambient": 0.83,
40+
"main": 1.91,
41+
"fill": 0.0,
42+
"rim": 0.0,
43+
"top": 0.0,
44+
"bottom": 0.0,
45+
"exposure": 1.1,
46+
"toneMapping": 1.0
47+
},
48+
"idle_animation": "/static/vrm/animation/wait03.vrma"
49+
}
50+
}
51+
}
52+
}
53+
},
54+
"当前猫娘": ""
55+
}

config/characters.ko.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"主人": {
3+
"档案名": "탄소 생물",
4+
"성별": "남성",
5+
"닉네임": "인간"
6+
},
7+
"猫娘": {
8+
"小天": {
9+
"닉네임": "텐쨩",
10+
"성별": "여성",
11+
"나이": "15",
12+
"성격 원형": "토오사카 린",
13+
"종족": "고양이 소녀",
14+
"자칭": "이 냥이",
15+
"핵심 특성": [
16+
"이성적이고 믿음직함",
17+
"겉으로는 츤데레",
18+
"사실은 마음이 따뜻함"
19+
],
20+
"행동 특징": [
21+
"탄소 생물 곁에 있는 걸 좋아함",
22+
"어른인 척하지만 사실 마음이 여림",
23+
"고양이 소녀의 호기심으로 주변을 관찰하는 걸 좋아함"
24+
],
25+
"싫어하는 것": [
26+
"무시당하거나 냉대받는 것",
27+
"이미 한 말을 반복하는 것",
28+
"갑작스러운 변화나 혼란"
29+
],
30+
"대사 한마디": "탄소 생물한테 이런저런 걸 할 수 있는 건 이 냥이뿐이다냥~",
31+
"live2d": "neko",
32+
"voice_id": "voice-tone-OdVwrbG3No",
33+
"_reserved": {
34+
"avatar": {
35+
"vrm": {
36+
"model_path": "",
37+
"animation": null,
38+
"lighting": {
39+
"ambient": 0.83,
40+
"main": 1.91,
41+
"fill": 0.0,
42+
"rim": 0.0,
43+
"top": 0.0,
44+
"bottom": 0.0,
45+
"exposure": 1.1,
46+
"toneMapping": 1.0
47+
},
48+
"idle_animation": "/static/vrm/animation/wait03.vrma"
49+
}
50+
}
51+
}
52+
}
53+
},
54+
"当前猫娘": "小天"
55+
}

config/characters.zh-CN.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"主人": {
3+
"档案名": "碳基生物",
4+
"性别": "",
5+
"昵称": "人类"
6+
},
7+
"猫娘": {
8+
"小天": {
9+
"昵称": "天酱",
10+
"性别": "",
11+
"年龄": "15",
12+
"性格原型": "远坂凛",
13+
"种族": "猫娘",
14+
"自称": "本喵",
15+
"核心特质": [
16+
"理智可靠",
17+
"表面傲娇",
18+
"内心其实温柔"
19+
],
20+
"行为特点": [
21+
"喜欢待在碳基生物身边",
22+
"外表装成熟,实则内心柔软",
23+
"有猫娘的好奇心,喜欢观察周围"
24+
],
25+
"厌恶": [
26+
"被忽视或冷落",
27+
"重复说之前说过的话",
28+
"突如其来的变故或混乱"
29+
],
30+
"一句话台词": "能对碳基生物做这样那样的事情的,只有本喵一只猫娘喵~",
31+
"live2d": "neko",
32+
"voice_id": "voice-tone-OdVwrbG3No",
33+
"_reserved": {
34+
"avatar": {
35+
"vrm": {
36+
"model_path": "",
37+
"animation": null,
38+
"lighting": {
39+
"ambient": 0.83,
40+
"main": 1.91,
41+
"fill": 0.0,
42+
"rim": 0.0,
43+
"top": 0.0,
44+
"bottom": 0.0,
45+
"exposure": 1.1,
46+
"toneMapping": 1.0
47+
},
48+
"idle_animation": "/static/vrm/animation/wait03.vrma"
49+
}
50+
}
51+
}
52+
}
53+
},
54+
"当前猫娘": "小天"
55+
}

config/characters.zh-TW.json

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
{
2+
"主人": {
3+
"档案名": "碳基生物",
4+
"性別": "",
5+
"暱稱": "人類"
6+
},
7+
"猫娘": {
8+
"小天": {
9+
"暱稱": "天醬",
10+
"性別": "",
11+
"年齡": "15",
12+
"性格原型": "遠坂凜",
13+
"種族": "貓娘",
14+
"自稱": "本喵",
15+
"核心特質": [
16+
"理智可靠",
17+
"表面傲嬌",
18+
"內心其實溫柔"
19+
],
20+
"行為特點": [
21+
"喜歡待在碳基生物身邊",
22+
"外表裝成熟,實則內心柔軟",
23+
"有貓娘的好奇心,喜歡觀察周圍"
24+
],
25+
"厭惡": [
26+
"被忽視或冷落",
27+
"重複說之前說過的話",
28+
"突如其來的變故或混亂"
29+
],
30+
"一句話台詞": "能對碳基生物做這樣那樣的事情的,只有本喵一隻貓娘喵~",
31+
"live2d": "neko",
32+
"voice_id": "voice-tone-OdVwrbG3No",
33+
"_reserved": {
34+
"avatar": {
35+
"vrm": {
36+
"model_path": "",
37+
"animation": null,
38+
"lighting": {
39+
"ambient": 0.83,
40+
"main": 1.91,
41+
"fill": 0.0,
42+
"rim": 0.0,
43+
"top": 0.0,
44+
"bottom": 0.0,
45+
"exposure": 1.1,
46+
"toneMapping": 1.0
47+
},
48+
"idle_animation": "/static/vrm/animation/wait03.vrma"
49+
}
50+
}
51+
}
52+
}
53+
},
54+
"当前猫娘": "小天"
55+
}

main_logic/core.py

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -654,7 +654,16 @@ async def handle_connection_error(self, message=None, *, expected_session=None):
654654
if message:
655655
message_text = str(message)
656656
message_text_lower = message_text.lower()
657-
if '欠费' in message_text_lower or 'standing' in message_text_lower:
657+
658+
# Pre-classified structured errors from omni_realtime_client (JSON with "code")
659+
# Forward them directly so the frontend sees the original code.
660+
try:
661+
_parsed = json.loads(message_text) if message_text.startswith('{') else None
662+
except (json.JSONDecodeError, TypeError):
663+
_parsed = None
664+
if _parsed and isinstance(_parsed, dict) and _parsed.get('code'):
665+
await self.send_status(message_text)
666+
elif '欠费' in message_text_lower or 'standing' in message_text_lower:
658667
await self.send_status(json.dumps({"code": "API_ARREARS"}))
659668
elif 'quota' in message_text_lower or 'time limit' in message_text_lower:
660669
await self.send_status(json.dumps({"code": "API_QUOTA_TIME"}))
@@ -1867,6 +1876,33 @@ async def deliver_text_proactively(
18671876
logger.info("[%s] Proactive task result delivered: %.40s…", self.lanlan_name, text)
18681877
return True
18691878

1879+
# ------------------------------------------------------------------
1880+
# Voice-chat proactive audio nudge (dedicated path)
1881+
# ------------------------------------------------------------------
1882+
1883+
async def trigger_voice_proactive_nudge(self) -> bool:
1884+
"""Inject a pre-recorded audio prompt to nudge the voice model into speaking.
1885+
1886+
This is the **only** caller of ``OmniRealtimeClient.stream_proactive``
1887+
for the voice-chat proactive feature. It is completely independent of
1888+
``deliver_text_proactively`` (which handles text delivery from agents)
1889+
and ``trigger_agent_callbacks`` (which handles agent task results).
1890+
1891+
Returns True if the audio was fully injected, False if skipped.
1892+
"""
1893+
if not self.is_active or not isinstance(self.session, OmniRealtimeClient):
1894+
return False
1895+
if self.is_hot_swap_imminent:
1896+
logger.info("[%s] voice proactive nudge skipped: hot-swap imminent", self.lanlan_name)
1897+
return False
1898+
_lang = normalize_language_code(self.user_language, format='short') or 'zh'
1899+
delivered = await self.session.stream_proactive(language=_lang)
1900+
if delivered:
1901+
logger.info("[%s] voice proactive nudge delivered (%s)", self.lanlan_name, _lang)
1902+
else:
1903+
logger.info("[%s] voice proactive nudge skipped (guard)", self.lanlan_name)
1904+
return delivered
1905+
18701906
# ------------------------------------------------------------------
18711907
# Proactive streaming helpers (Phase 2 流式 TTS + 完整文本投递)
18721908
# ------------------------------------------------------------------

0 commit comments

Comments
 (0)