Skip to content

Qwen3 tts with websocket support#351

Open
rophec wants to merge 40 commits into
Project-N-E-K-O:mainfrom
rophec:qwen3_tts
Open

Qwen3 tts with websocket support#351
rophec wants to merge 40 commits into
Project-N-E-K-O:mainfrom
rophec:qwen3_tts

Conversation

@rophec
Copy link
Copy Markdown
Contributor

@rophec rophec commented Feb 24, 2026

Summary by CodeRabbit

  • 新功能

    • 增加本地 Qwen3 文本转语音服务,支持 WebSocket 实时音频流、热切换语音、按句/按块分发及流式/非流式合成
    • 扩展 TTS 客户端以支持本地 Qwen3 实现,包含文本缓冲、分段提交、重采样与回放保存
    • 新增示例客户端,可连接实时服务并保存合成音频输出
  • 杂项

    • 仓库配置中添加新的子模块引用以托管本地 TTS 服务代码

rophec and others added 30 commits February 2, 2026 17:21
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

新增基于 Qwen3 的本地 WebSocket TTS 子系统:添加 git 子模块、实现异步 WebSocket TTS 服务器与演示客户端,并在 TTS 调度中加入 local_qwen3_tts_worker 以支持本地 Qwen3 流式合成喵。

Changes

Cohort / File(s) Summary
Git Submodule 配置
\.gitmodules, local_server/qwen3_tts_server/Qwen3-TTS
新增 Qwen3-TTS 子模块引用(path/url/branch),仅仓库配置与子模块提交记录变更喵。
Qwen3 TTS 服务器实现
local_server/qwen3_tts_server/local_server.py
新增 QwenLocalServer 类与 main():模型加载、语音克隆提示管理、后台推理队列、流式/非流式推理、任务取消与热切换、WebSocket 协议与消息生命周期处理喵。
Qwen3 TTS 客户端演示
local_server/qwen3_tts_server/server_commit.py
新增演示客户端脚本:WebSocket 客户端、音频回调与播放、文本分片发送、WAV 保存与有序关闭流程喵。
TTS 客户端调度扩展
main_logic/tts_client.py
新增 local_qwen3_tts_worker 并调整 get_tts_worker 调度逻辑以根据 base_url 路由至本地 Qwen3 实现,包含采样率、提交策略与流式模式控制常量喵。

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (客户端)
    participant Server as Qwen3_TTS_Server (WebSocket 服务器)
    participant Queue as TaskQueue (任务队列)
    participant Worker as Worker (后台推理线程)
    participant Model as Qwen3_Model (模型)

    Client->>Server: 建立 WebSocket 并发送文本喵
    Server->>Server: 文本缓冲与分句喵
    Server->>Queue: 入队句子任务喵
    Queue->>Worker: 取出推理任务喵
    Worker->>Model: 执行语音克隆推理喵
    Model-->>Worker: 返回音频块喵
    Worker->>Server: 推送音频块到发送队列喵
    Server-->>Client: 流式发送 PCM 音频块喵
    Client->>Server: 可选发送取消请求喵
    Server->>Queue: 处理取消并停止对应任务喵
Loading

Estimated code review effort

🎯 4 (复杂) | ⏱️ ~45 分钟

Possibly related PRs

  • Add local cosyvoice #192 — 在 main_logic/tts_client.py 中对本地 WebSocket TTS worker 的添加与路由调度进行过相关更改,代码层面密切相关喵。

Suggested reviewers

  • wehos

Poem

新服上线声波萌动喵,
文本排队句句排好喵,
推理后台悄悄奔跑喵,
流式音块轻敲你耳喵,
子模块到位代码更暖喵。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR标题准确反映了主要变更内容——新增Qwen3 TTS服务并支持WebSocket通信,与提交的核心功能高度契合。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (4)
main_logic/tts_client.py (1)

1820-1826: except: pass 会吞掉所有异常喵

这里的 session.update 发送失败时,用裸 except 把所有异常都吃掉了,连日志都不打一条。如果连接有问题,这里会静默失败,后续排查起来很痛苦喵。

💊 至少记录一下嘛
                 try:
                     await ws.send(json.dumps({
                         "type": "session.update",
                         "language": "Chinese"
                     }))
-                except:
-                    pass
+                except Exception as e:
+                    logger.debug(f"session.update 发送失败(可忽略): {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1820 - 1826, The try/except around the
await ws.send(...) which sends the "session.update" message currently swallows
all exceptions; change it to catch Exception as e and log the failure instead of
passing silently (e.g., use logger.exception or logger.error with the exception)
so send failures on ws.send("session.update", language="Chinese") are recorded
for debugging; keep the send call and only suppress expected recoverable errors
after logging.
local_server/qwen3_tts_server/server_commit.py (2)

97-108: consumer_task 取消后没有 await 等待,可能产生警告喵

Line 108 调用了 consumer_task.cancel() 但之后没有 await consumer_task 来等待它真正结束。这在 asyncio 里会产生 "Task was destroyed but it is pending" 的警告喵。

🐾 建议等待取消完成
     await client.close()
     consumer_task.cancel()
+    try:
+        await consumer_task
+    except asyncio.CancelledError:
+        pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/server_commit.py` around lines 97 - 108,
consumer_task is cancelled but not awaited, which can trigger "Task was
destroyed but it is pending" warnings; after calling consumer_task.cancel()
await consumer_task (inside a try/except asyncio.CancelledError) to ensure the
task finishes and any cleanup runs. Locate the block that creates consumer_task
from client.handle_messages(), producer_task from _produce_text, and the
shutdown sequence that calls client.wait_for_response_done() and client.close();
modify it so you cancel consumer_task, then await consumer_task (catching and
swallowing asyncio.CancelledError or handling exceptions) before finishing
shutdown.

15-23: 模块级副作用:import 就会爆炸喵

喵…这个文件在模块顶层做了两件危险的事:

  1. Line 15-16: 如果环境变量没设就直接 raise ValueError,别的模块想 import 一下就炸了喵
  2. Line 22: pyaudio.PyAudio() 在模块加载时就初始化音频设备,没装 PortAudio 就会崩喵

作为 demo 脚本倒还好,但如果有人想复用里面的函数就很头疼了。建议把这些初始化挪到 main()_run_demo() 里喵~

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/server_commit.py` around lines 15 - 23, The
module has dangerous top-level side effects: checking API_KEY and calling
pyaudio.PyAudio() at import; move these into runtime initialization (e.g.,
main() or _run_demo()). Specifically, remove the top-level raise for API_KEY and
instead validate DASHSCOPE_API_KEY inside main() or _run_demo(), raising the
ValueError there; likewise, defer creating the PyAudio instance and
_audio_stream by moving the pyaudio.PyAudio() call and any audio device setup
into a lazy initializer used by main()/_run_demo() (or create a helper like
init_audio() that sets _audio_pyaudio and _audio_stream), keep
_AUDIO_SAMPLE_RATE and _audio_chunks as simple constants/state but avoid device
initialization at import. Ensure function names referenced: API_KEY,
_audio_pyaudio, _audio_stream, _AUDIO_SAMPLE_RATE, main(), and _run_demo().
local_server/qwen3_tts_server/local_server.py (1)

62-77: 硬编码的测试用路径和文本散落在构造函数里喵

Line 62 的 nyaning_voice.pt、Line 72 的日文参考文本、Line 73 的 uttid_f1.wav 都是测试/开发用的默认值,直接硬编码在类的构造函数里了喵。虽然注释说了"测试用",但这些值会在生产环境中作为 fallback 使用,可能导致不可预期的行为喵。建议至少把这些提取为类常量或模块常量,方便后续修改喵~

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 62 - 77, The
constructor currently hardcodes test/dev defaults into attributes (self.pt_path
uses "nyaning_voice.pt", self.ref_text uses a Japanese sample, self.ref_wav uses
"uttid_f1.wav"), which should be extracted to constants or config; define
module- or class-level constants (e.g. DEFAULT_VOICE_PT, DEFAULT_REF_TEXT,
DEFAULT_REF_WAV or pull from env/config) and change assignments in __init__
(references: self.pt_path, self.ref_text, self.ref_wav) to fall back to those
constants only, document them and ensure callers can override via parameters so
production never unintentionally uses test artifacts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 215-217: The except block inside _load_engine currently only logs
the exception so self.model may remain None and the server continues; change it
to abort startup like the earlier startup logic by either re-raising the caught
exception or calling sys.exit(1) after logging so the process stops when engine
loading fails (ensure you import sys if you use sys.exit); this ensures
_do_inference sees a valid self.model or the process exits instead of silently
returning no audio.
- Around line 280-320: _sender_loop reads the shared variable current_job_id
after an await which can lead to a race; fix by snapshotting the job id
immediately after dequeuing a chunk and before any awaits so the response.done
uses the captured value. Specifically, in the async function _sender_loop (which
consumes audio_queue), after chunk = await audio_queue.get() capture
local_job_id = current_job_id (or include job_id when enqueuing) and then use
local_job_id when building/sending {"type":"response.done","job_id":
local_job_id}; this avoids relying on the mutable outer current_job_id that
_stop_current_job may change.
- Around line 179-186: The model loading call to
Qwen3TTSForConditionalGeneration.from_pretrained hard-codes device_map="cuda"
and attn_implementation="flash_attention_2", which conflicts with the earlier
CPU fallback; update the device_map to use the same conditional (e.g. the
existing "cuda" if torch.cuda.is_available() else "cpu" or use self.device) and
only set attn_implementation="flash_attention_2" when running on CUDA (omit or
choose a CPU-safe attention implementation when on CPU); adjust any dtype/device
parameters accordingly so model loading works on both GPU and CPU environments.
- Line 409: The TTS server currently defaults host to "0.0.0.0" (in the
local_server.py assignment using tts_custom.get("host")), which exposes the
service to the LAN; change the default to "127.0.0.1" so the line using host =
tts_custom.get("host") or "0.0.0.0" becomes host = tts_custom.get("host") or
"127.0.0.1" (or equivalent) and ensure any related startup code that reads this
host (e.g., the TTS server bootstrap/serve function) respects that value and
documents that remote access must be explicitly configured by the user.
- Around line 254-257: The conversion from float to int16 in the block that
computes audio_data = wavs[0].flatten() and audio_int16 = (audio_data *
32767).astype(np.int16) lacks clipping and can overflow; modify the conversion
to first scale then clip to int16 range (e.g., scaled = (audio_data *
32767).round() or just (audio_data * 32767) then use np.clip(scaled, -32768,
32767)) and finally astype(np.int16); follow the same approach used in
_resample_audio for reference to avoid overflow and distortion.
- Line 87: 当前代码在调用 torch.load(pt_path, map_location=self.device,
weights_only=False)(出现在本文件的这处和另一处加载逻辑)开启了 pickle 反序列化风险;将 weights_only 参数改为 True
并确保通过 torch.serialization.add_safe_globals(...) 注册所有在序列化对象中可能出现的自定义类型(例如
VoiceClonePromptItem 及其他自定义类)以便安全加载;具体操作:在加载点(torch.load 调用处)把
weights_only=False 改为 weights_only=True,并在初始化/模块加载阶段使用
torch.serialization.add_safe_globals([...]) 列出并注册所有需要的类型,确保 pt
文件只会还原权重而不会执行任意代码。

In `@local_server/qwen3_tts_server/Qwen3-TTS`:
- Line 1: The submodule in .gitmodules points to a third‑party fork
(https://github.com/rophec/Qwen3-TTS) and must be changed to the official
upstream; update the url entry in .gitmodules to
https://github.com/QwenLM/Qwen3-TTS (replace the current url value), then in the
submodule directory check out the exact trusted commit (the specific commit hash
you want) and commit the superproject so the submodule gitlink points to that
SHA rather than tracking a branch; also remove or stop relying on the branch =
qwen3_tts setting so the repo no longer follows a floating branch on a fork.

In `@main_logic/tts_client.py`:
- Around line 1888-1894: The handling for sid is missing waiting for the server
to finish after send_append/send_commit, which can drop the last audio chunk;
update the sid is None branch (where text_buf, send_append, send_commit and
current_speech_id are used) to, after calling send_commit, wait for the server's
response.done (with the same timeout logic used elsewhere in other workers)
before clearing text_buf and continuing—ensure you reference the same response
waiting helper or await pattern used by other workers so the final audio segment
is fully received.
- Around line 1734-1735: local_qwen3_tts_worker currently ignores the voice_id
parameter causing callers to be unable to choose voices; update
local_qwen3_tts_worker to use voice_id to select the appropriate voice model or
forward it to the local TTS service: locate the function local_qwen3_tts_worker
and either map voice_id to a voice_pt_path (or other local model path) and
load/use that model, or include voice_id in the payload sent to the local TTS
server so the server can select the voice; ensure the chosen symbol names
(voice_id, voice_pt_path, and any payload or request variable) are used
consistently when selecting or sending the voice selection.
- Around line 1511-1513: The current if-block routes all ws/wss base_url values
to local_qwen3_tts_worker, making local_cosyvoice_worker unreachable; revert the
URL routing logic to match the original ws/wss validation (as used around Line
1562-1563) so that ws/wss connections can still select local_cosyvoice_worker
when appropriate (adjust the conditional in the function that checks base_url
and returns local_qwen3_tts_worker/local_cosyvoice_worker), and if
local_cosyvoice_worker is truly deprecated add an explicit deprecation log
message before returning or remove the dead return path entirely; reference the
base_url check and the symbols local_qwen3_tts_worker and local_cosyvoice_worker
when applying the change.
- Around line 1833-1846: 函数 close_connection 名称与行为不符(它实际上创建新连接且未被调用),导致
voice_pt_path 及重连逻辑未被执行;请将其改为真正的重连/关闭逻辑并在 async_worker 中在 speech_id
变化时调用,或者移除无用函数以消除混淆。具体修复:在 async_worker 中查找对 speech_id 变化的处理点并在变化时调用一个正确实现的
reconnect 或 close_then_reopen
函数(定位符:async_worker、speech_id、receive_loop、voice_pt_path、close_connection);实现应先取消/await
现有 receive_task、关闭旧 ws(await ws.close())、然后根据需要创建新 ws、发送 session.update 包含
voice_pt_path,并重建 receive_task(使用现有 receive_loop(ws))。如果决定删除功能,移除
close_connection 并确保 voice_pt_path 的配置被另行发送或文档化为不可用。
- Around line 1756-1758: CURRENT_DIR is assigned but never used (F841) while
voice_pt_path is built using os.path.dirname(__file__) directly; either remove
the unused CURRENT_DIR assignment or use it to construct voice_pt_path for
consistency. Update the code around CURRENT_DIR and voice_pt_path so you either
drop the CURRENT_DIR line entirely, or replace os.path.dirname(__file__) with
CURRENT_DIR when building voice_pt_path (ensuring path remains absolute via
os.path.abspath).
- Around line 1775-1806: The async_worker lacks a finally cleanup so WebSocket
and receive_task can leak; add a try/finally around the worker's loop (the
async_worker function that spawns receive_loop) and in the finally ensure that
if receive_task exists you cancel it and await its termination, and if ws exists
you close the websocket connection (and await close) and set receive_task/ws to
None; reference the async_worker function, the receive_loop inner coroutine, and
the ws and receive_task variables when making these changes.
- Around line 1896-1918: When speech_id changes the code only sends
send_cancel() and appends to text_buf but does not reset the audio pipeline or
recreate the connection; update the logic where current_speech_id is compared to
sid (around current_speech_id, send_cancel, send_append, send_commit) to fully
teardown and reinitialize state: after detecting a different sid call
send_cancel(), clear/reset the resampler by invoking resampler.clear() (or
equivalent reset), close/recreate any per-speech connection or stream object
used by this worker, reset text_buf and other per-speech buffers, then set
current_speech_id = sid so the new speech starts with a clean pipeline before
send_append/send_commit.

---

Nitpick comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 62-77: The constructor currently hardcodes test/dev defaults into
attributes (self.pt_path uses "nyaning_voice.pt", self.ref_text uses a Japanese
sample, self.ref_wav uses "uttid_f1.wav"), which should be extracted to
constants or config; define module- or class-level constants (e.g.
DEFAULT_VOICE_PT, DEFAULT_REF_TEXT, DEFAULT_REF_WAV or pull from env/config) and
change assignments in __init__ (references: self.pt_path, self.ref_text,
self.ref_wav) to fall back to those constants only, document them and ensure
callers can override via parameters so production never unintentionally uses
test artifacts.

In `@local_server/qwen3_tts_server/server_commit.py`:
- Around line 97-108: consumer_task is cancelled but not awaited, which can
trigger "Task was destroyed but it is pending" warnings; after calling
consumer_task.cancel() await consumer_task (inside a try/except
asyncio.CancelledError) to ensure the task finishes and any cleanup runs. Locate
the block that creates consumer_task from client.handle_messages(),
producer_task from _produce_text, and the shutdown sequence that calls
client.wait_for_response_done() and client.close(); modify it so you cancel
consumer_task, then await consumer_task (catching and swallowing
asyncio.CancelledError or handling exceptions) before finishing shutdown.
- Around line 15-23: The module has dangerous top-level side effects: checking
API_KEY and calling pyaudio.PyAudio() at import; move these into runtime
initialization (e.g., main() or _run_demo()). Specifically, remove the top-level
raise for API_KEY and instead validate DASHSCOPE_API_KEY inside main() or
_run_demo(), raising the ValueError there; likewise, defer creating the PyAudio
instance and _audio_stream by moving the pyaudio.PyAudio() call and any audio
device setup into a lazy initializer used by main()/_run_demo() (or create a
helper like init_audio() that sets _audio_pyaudio and _audio_stream), keep
_AUDIO_SAMPLE_RATE and _audio_chunks as simple constants/state but avoid device
initialization at import. Ensure function names referenced: API_KEY,
_audio_pyaudio, _audio_stream, _AUDIO_SAMPLE_RATE, main(), and _run_demo().

In `@main_logic/tts_client.py`:
- Around line 1820-1826: The try/except around the await ws.send(...) which
sends the "session.update" message currently swallows all exceptions; change it
to catch Exception as e and log the failure instead of passing silently (e.g.,
use logger.exception or logger.error with the exception) so send failures on
ws.send("session.update", language="Chinese") are recorded for debugging; keep
the send call and only suppress expected recoverable errors after logging.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57a54c2 and cb4963d.

📒 Files selected for processing (6)
  • .gitmodules
  • local_server/qwen3_tts_server/Qwen3-TTS
  • local_server/qwen3_tts_server/local_server.py
  • local_server/qwen3_tts_server/server_commit.py
  • local_server/qwen3_tts_server/tts_realtime_client.py
  • main_logic/tts_client.py

threading.Thread(target=self._worker_loop, daemon=True).start()

def _load_prompt_from_pt(self, pt_path: str):
payload = torch.load(pt_path, map_location=self.device, weights_only=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

torch.load weights_only parameter PyTorch 2.6 security RCE

💡 Result:

  • Why weights_only matters (security/RCE): torch.load() historically defaulted to weights_only=False, which uses Python pickle and is inherently unsafe (a malicious checkpoint can execute arbitrary code during deserialization). PyTorch announced (and warned since 2.4) that it would flip the default to reduce this risk. [1][2]
  • What changed in PyTorch 2.6: In PyTorch 2.6, the default for torch.load(..., weights_only=...) was changed from False to True as a security improvement (backward-compatibility breaking). [1][2]
  • The key RCE issue you’re referring to: CVE-2025-32434 / GHSA-53q9-r3pm-6pq6 reported that even torch.load(..., weights_only=True) could still lead to RCE in PyTorch versions < 2.6.0, and that the issue is fixed in 2.6.0. [3][4]
  • Practical takeaway: If you rely on weights_only=True as a mitigation, you should be on PyTorch ≥ 2.6.0 (or newer), and still treat untrusted checkpoints as dangerous. [3][4]

Sources
[1] PyTorch 2.6 release blog (notes default weights_only change): https://pytorch.org/blog/pytorch2-6/
[2] PyTorch dev-discuss breaking change note (default flipped to weights_only=True): https://dev-discuss.pytorch.org/t/bc-breaking-change-torch-load-is-being-flipped-to-use-weights-only-true-by-default-in-the-nightlies-after-137602/2573
[3] GitHub Advisory GHSA-53q9-r3pm-6pq6 (RCE with weights_only=True, fixed in 2.6.0): GHSA-53q9-r3pm-6pq6
[4] OSV entry for GHSA-53q9-r3pm-6pq6 / CVE-2025-32434 (affected <2.6.0, fixed 2.6.0): https://osv.dev/vulnerability/GHSA-53q9-r3pm-6pq6


🏁 Script executed:

find . -type f -name "local_server.py" | head -5

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 114


🏁 Script executed:

wc -l ./local_server/qwen3_tts_server/local_server.py

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 118


🏁 Script executed:

head -50 ./local_server/qwen3_tts_server/local_server.py | cat -n

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 1878


🏁 Script executed:

sed -n '35,45p' ./local_server/qwen3_tts_server/local_server.py | cat -n

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 584


🏁 Script executed:

sed -n '80,95p' ./local_server/qwen3_tts_server/local_server.py | cat -n

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 798


🏁 Script executed:

sed -n '195,205p' ./local_server/qwen3_tts_server/local_server.py | cat -n

Repository: Project-N-E-K-O/N.E.K.O

Length of output: 663


torch.load(..., weights_only=False) 确实存在安全风险喵

Line 87 和 Line 199 都使用了 weights_only=False 来加载 .pt 文件。这允许 pickle 反序列化在解包任意恶意 .pt 文件时执行代码,存在 RCE 漏洞喵!PyTorch 官方已在 2.6 版本将默认值改为 weights_only=True 以应对这个安全隐患呢。

Line 40 的 torch.serialization.add_safe_globals([VoiceClonePromptItem]) 只能作为部分防御,但无法完全消除 weights_only=False 带来的风险喵。建议改用 weights_only=True 并通过 add_safe_globals 注册所有需要的类型,这样才是真正安全的做法喵~

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` at line 87, 当前代码在调用
torch.load(pt_path, map_location=self.device,
weights_only=False)(出现在本文件的这处和另一处加载逻辑)开启了 pickle 反序列化风险;将 weights_only 参数改为 True
并确保通过 torch.serialization.add_safe_globals(...) 注册所有在序列化对象中可能出现的自定义类型(例如
VoiceClonePromptItem 及其他自定义类)以便安全加载;具体操作:在加载点(torch.load 调用处)把
weights_only=False 改为 weights_only=True,并在初始化/模块加载阶段使用
torch.serialization.add_safe_globals([...]) 列出并注册所有需要的类型,确保 pt
文件只会还原权重而不会执行任意代码。

Comment on lines +179 to +186
raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16, # 给 transformers 看
dtype=torch.bfloat16, # 给 qwen 看
attn_implementation="flash_attention_2",
device_map="cuda",
low_cpu_mem_usage=True
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

device_map="cuda" 硬编码,与 Line 61 的 CPU fallback 矛盾喵

Line 61 明明写了 "cuda" if torch.cuda.is_available() else "cpu",但 Line 184 又把 device_map 硬编码成 "cuda"。在没有 GPU 的机器上,self.device"cpu" 但模型加载会因为 device_map="cuda" 而失败喵!

🐾 保持一致喵
             raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
                 model_path,
                 torch_dtype=torch.bfloat16,
                 dtype=torch.bfloat16,
                 attn_implementation="flash_attention_2",
-                device_map="cuda",
+                device_map=self.device,
                 low_cpu_mem_usage=True
             )

注意:flash_attention_2 也只在 GPU 上可用,CPU 环境下也需要条件判断喵。

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16, # 给 transformers 看
dtype=torch.bfloat16, # 给 qwen 看
attn_implementation="flash_attention_2",
device_map="cuda",
low_cpu_mem_usage=True
)
raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16, # 给 transformers 看
dtype=torch.bfloat16, # 给 qwen 看
attn_implementation="flash_attention_2",
device_map=self.device,
low_cpu_mem_usage=True
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 179 - 186, The
model loading call to Qwen3TTSForConditionalGeneration.from_pretrained
hard-codes device_map="cuda" and attn_implementation="flash_attention_2", which
conflicts with the earlier CPU fallback; update the device_map to use the same
conditional (e.g. the existing "cuda" if torch.cuda.is_available() else "cpu" or
use self.device) and only set attn_implementation="flash_attention_2" when
running on CUDA (omit or choose a CPU-safe attention implementation when on
CPU); adjust any dtype/device parameters accordingly so model loading works on
both GPU and CPU environments.

Comment thread local_server/qwen3_tts_server/local_server.py
Comment thread local_server/qwen3_tts_server/local_server.py Outdated
Comment on lines +280 to +320
async def handle_tts(self, websocket):
logger.info(f"客户端连接: {websocket.remote_address}")
loop = asyncio.get_running_loop()

# 智能拼句缓冲区
sentence_buffer = ""

current_job_id = None
cancel_event = threading.Event()
audio_queue = asyncio.Queue()
last_voice_version = self.voice_version

async def _stop_current_job(keep_buffer: bool = False):
nonlocal current_job_id, cancel_event, sentence_buffer
cancel_event.set()
current_job_id = None
cancel_event = threading.Event()
while not audio_queue.empty():
try:
audio_queue.get_nowait()
except:
break
if not keep_buffer:
sentence_buffer = ""

# 发送循环
async def _sender_loop():
while True:
try:
chunk = await audio_queue.get()
if chunk == b"__END__":
if current_job_id:
# 适配 N.E.K.O 客户端协议
response_done = {"type": "response.done", "job_id": current_job_id}
await websocket.send(json.dumps(response_done))
continue
await websocket.send(chunk)
except Exception:
break

sender_task = asyncio.create_task(_sender_loop())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

_sender_loop 中收到 __END__ 时用 current_job_id 存在竞态喵

_sender_loop(Line 306-318)是一个独立的 async task,它在 Line 311 读取外部作用域的 current_job_id。但 current_job_id 可能在 handle_tts 的消息处理循环中被修改(比如 _stop_current_job 会设成 None)。这两个协程虽然在同一个事件循环里不会真正并发,但在 await audio_queue.get() 挂起和恢复之间,current_job_id 的值可能已经变了,导致 response.done 发送时用了错误的 job_id 或者根本不发送喵。

🧰 Tools
🪛 Ruff (0.15.2)

[error] 300-300: Do not use bare except

(E722)


[warning] 317-317: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 280 - 320,
_sender_loop reads the shared variable current_job_id after an await which can
lead to a race; fix by snapshotting the job id immediately after dequeuing a
chunk and before any awaits so the response.done uses the captured value.
Specifically, in the async function _sender_loop (which consumes audio_queue),
after chunk = await audio_queue.get() capture local_job_id = current_job_id (or
include job_id when enqueuing) and then use local_job_id when building/sending
{"type":"response.done","job_id": local_job_id}; this avoids relying on the
mutable outer current_job_id that _stop_current_job may change.

Comment thread main_logic/tts_client.py Outdated
Comment thread main_logic/tts_client.py
Comment thread main_logic/tts_client.py Outdated
Comment thread main_logic/tts_client.py Outdated
Comment thread main_logic/tts_client.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main_logic/tts_client.py (1)

208-244: ⚠️ Potential issue | 🟡 Minor

等待完成的超时日志写成 30 秒但实际是 20 秒,会误导排障喵
Line 220 的 timeout=20.0,但 Line 223 的日志写了“30秒”,建议统一成同一个常量或修正文案喵。

建议修改(修正文案)
-                            except asyncio.TimeoutError:
-                                logger.warning("等待响应完成超时(30秒),强制关闭连接")
+                            except asyncio.TimeoutError:
+                                logger.warning("等待响应完成超时(20秒),强制关闭连接")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 208 - 244, The timeout log is
inconsistent: asyncio.wait_for(response_done.wait(), timeout=20.0) uses 20
seconds but the warning message says “30秒”; update the code so the timeout value
and log message match by either extracting the timeout into a single constant
(e.g., TIMEOUT_SECONDS) used by the wait_for call and in the logger.warning, or
change the warning text to “20秒”; modify the symbols response_done.wait(),
timeout=20.0 and the logger.warning(...) to use the same value.
♻️ Duplicate comments (7)
main_logic/tts_client.py (4)

1775-1847: ⚠️ Potential issue | 🟠 Major

async_worker 缺少 finally 清理且 close_connection 行为/命名不符,容易泄漏与误导喵
Line 1775-1921 的主循环退出时不会关闭 ws、也不会取消 receive_task 喵;而 Line 1833-1847 的 close_connection 名为“关闭”实为“新建连接”,且当前代码里没被调用喵。
建议喵:

  • async_worker 外层加 try/finally,统一 cancel/close 喵。
  • close_connection 要么删掉减少迷惑,要么实现成真正的 “close_then_reconnect”(先 cancel/close,再 connect,再发送更新)并在需要的地方调用喵。

Also applies to: 1922-1925

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1775 - 1847, The async_worker lacks a
finally cleanup and close_connection is misnamed/misimplemented, so add a
try/finally around async_worker's main logic to ensure receive_task is
cancelled/awaited and ws is closed (use receive_task.cancel() then await it with
suppressions, and call await ws.close() if ws is not None), and either remove
close_connection or refactor it into a true close_then_reconnect that first
cancels/awaits receive_task and closes the existing ws before creating a new
connection via websockets.connect; update callers to use connect() for initial
open and the new close_then_reconnect (or delete unused close_connection) so
there is no resource leak.

1734-1759: ⚠️ Potential issue | 🟠 Major

local_qwen3_tts_worker 仍然忽略 voice_id,调用方没法选音色喵
Line 1734 的 voice_id 在函数体内没被用到(静态分析也提示 ARG001)喵。
至少需要二选一喵:

  • voice_id 映射为 voice_pt_path 并在握手/session.update 里发给服务端喵;或喵
  • 明确声明“本地 Qwen3 音色由服务端 active_voice.json 控制,voice_id 仅占位”并用 _ = voice_id 消掉告警喵。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1734 - 1759, The function
local_qwen3_tts_worker currently ignores the voice_id parameter; either use it
or explicitly mark it unused—preferably map voice_id to the local voice file and
send it to the TTS server during handshake/session update: add a small mapping
or resolution from voice_id -> voice_pt_path (e.g., switch/dict) and include
that resolved path (or a voice name) in the initial handshake or session.update
call to the server (referencing local_qwen3_tts_worker, voice_id, voice_pt_path,
and session.update/handshake) so callers can select voices; alternatively, if
you choose not to support selection, replace the unused param with a deliberate
noop (_ = voice_id) and update the function docstring to state that voice
selection is server-controlled.

1887-1905: ⚠️ Potential issue | 🟠 Major

sid is None 只 commit 不等待完成也不隔离连接,可能尾音丢失或下一轮串音喵
Line 1888-1895:flush 之后立刻把 current_speech_id=None 然后继续循环喵。
如果下一条 speech 很快进来,因为 current_speech_id is None,Line 1898 的 cancel 分支不会触发,服务端可能还在吐上一轮音频,结果就把两轮混在一起了喵。
建议至少喵:在 sid is None 的 flush 后等待服务端一个“完成”信号(或协议加 job_id 对齐 done),再允许下一轮开始;做不到的话就把该分支改成“flush + 关闭连接 + 下轮重连”,并在重连时重置 resampler 喵。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1887 - 1905, 在处理 sid is None
分支中,当前只是调用 send_append/send_commit 后立即把 current_speech_id
置空并继续,导致下一轮语音可能与上一轮未完全完成的音频混音;请修改 main_logic/tts_client.py 中该分支(涉及
sid、current_speech_id、send_append、send_commit、send_cancel、resampler、soxr.ResampleStream、src_rate、DST_RATE)以确保在
flush 后要么等待服务端明确的“完成/done”信号再将 current_speech_id 置空并继续,要么在不支持等待的情况下改为 flush
后立即关闭并重建连接并重置 resampler(用 new soxr.ResampleStream(src_rate, DST_RATE, 1,
dtype="float32"))以避免尾音丢失或串音。

1501-1514: ws/wss 自定义 TTS 现在固定走 local_qwen3_tts_worker,需要明确迁移/退化策略喵
Line 1509-1513:http(s) → GPT-SoVITS v3、ws(s) → Qwen3,本地 CosyVoice WS 分支被注释掉了喵。
如果这是明确弃用,建议加一条显式 warning(提示“ws/wss 将按 Qwen3 协议解释”)并更新文档喵;如果仍需兼容旧 CosyVoice WS,则需要一个显式 selector(例如 tts_custom.api_type)避免用户配置“看起来对但实际连不上”喵。

#!/bin/bash
# 核对是否还有地方依赖/文档提到 ws/wss=CosyVoice
rg -n "local_cosyvoice_worker|get_tts_worker\\(|tts_custom" -S
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1501 - 1514, The current logic in
get_config_manager()/tts_config inspects tts_custom.base_url and silently maps
http(s)→gptsovits_tts_worker and ws(s)→local_qwen3_tts_worker while the
local_cosyvoice_worker branch is commented out; update this by either (A)
explicitly logging a warning when base_url startswith "ws://" or "wss://" that
"ws/wss will be interpreted as Qwen3 protocol" and keep returning
local_qwen3_tts_worker, or (B) add a clear selector field
tts_config.get('api_type') (e.g., "qwen3" | "gptsovits" | "cosyvoice") and use
that to choose between gptsovits_tts_worker, local_qwen3_tts_worker, or
local_cosyvoice_worker respectively (falling back to a safe default and logging
the chosen mapping); ensure you reference tts_config, base_url,
gptsovits_tts_worker, local_qwen3_tts_worker and local_cosyvoice_worker and emit
a log.warn when falling back so users aren’t surprised.
local_server/qwen3_tts_server/local_server.py (3)

414-446: ⚠️ Potential issue | 🟠 Major

_sender_loop 用外部 current_job_id 发送 response.done 有竞态,建议把 job_id 跟 chunk 一起入队喵
chunk = await audio_queue.get() 这个 await 期间 current_job_id 可能被 _stop_current_job() 改掉喵。
最稳的方式是:audio_queue 里直接放 (job_id, chunk)__END__ 也带 job_id,这样 sender 不需要读共享可变变量喵。

建议修改(chunk 入队携带 job_id)
-            loop.call_soon_threadsafe(audio_queue.put_nowait, audio_int16.tobytes())
+            loop.call_soon_threadsafe(audio_queue.put_nowait, (job_id, audio_int16.tobytes()))
@@
-                    loop.call_soon_threadsafe(audio_queue.put_nowait, chunk)
+                    loop.call_soon_threadsafe(audio_queue.put_nowait, (job_id, chunk))
@@
-            loop.call_soon_threadsafe(audio_queue.put_nowait, b"__END__")
+            loop.call_soon_threadsafe(audio_queue.put_nowait, (job_id, b"__END__"))
         async def _sender_loop():
             while True:
                 try:
-                    chunk = await audio_queue.get()
-                    if chunk == b"__END__":
-                        if current_job_id:
+                    job_id, chunk = await audio_queue.get()
+                    if chunk == b"__END__":
+                        if job_id:
                             # 适配 N.E.K.O 客户端协议
-                            response_done = {"type": "response.done", "job_id": current_job_id}
+                            response_done = {"type": "response.done", "job_id": job_id}
                             await websocket.send(json.dumps(response_done))
                         continue
                     await websocket.send(chunk)
                 except Exception:
                     break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 414 - 446,
_sender_loop reads shared current_job_id while waiting on audio_queue which
creates a race; change the queue protocol so audio_queue carries tuples (job_id,
chunk) and treat the end marker as (job_id, "__END__") so the sender uses the
job_id from the dequeued tuple instead of the external current_job_id. Update
places that put data into audio_queue (where chunks are enqueued and in
_stop_current_job when signaling end) to enqueue (current_job_id, chunk) or
(current_job_id, "__END__") and modify _sender_loop to unpack job_id, chunk =
await audio_queue.get(), send chunk and, when chunk == "__END__", build
response_done using the dequeued job_id rather than reading current_job_id.

208-234: ⚠️ Potential issue | 🟠 Major

CPU fallback 还不完整:flash_attention_2 / device_map 需要条件分支否则会直接启动失败喵
你前面把 self.device = "cuda" if ... else "cpu" 做好了喵,但 Line 220 仍然无条件 attn_implementation="flash_attention_2" 喵。
这个在 CPU 环境大概率直接炸(以及 device_map=self.device 是否被 transformers/accelerate 接受也需要确认)喵。

建议修改(按 device 组装 kwargs)
-            raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
-                model_path,
-                torch_dtype=torch.bfloat16,  # 给 transformers 看
-                dtype=torch.bfloat16,  # 给 qwen 看
-                attn_implementation="flash_attention_2",
-                device_map=self.device,  # 旧的是cuda 硬编码
-                low_cpu_mem_usage=True
-            )
+            model_kwargs = {
+                "model_path": model_path,
+                "low_cpu_mem_usage": True,
+            }
+            if self.device == "cuda":
+                model_kwargs.update({
+                    "torch_dtype": torch.bfloat16,
+                    "dtype": torch.bfloat16,
+                    "attn_implementation": "flash_attention_2",
+                    "device_map": "cuda",
+                })
+            else:
+                # CPU: 不要强行 flash_attention_2
+                model_kwargs.update({
+                    "torch_dtype": torch.float32,
+                    "dtype": torch.float32,
+                    "device_map": "cpu",
+                })
+
+            raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(**model_kwargs)
Transformers `from_pretrained` 的 `device_map` 参数支持直接传入字符串 "cuda"/"cpu" 吗?以及 `attn_implementation="flash_attention_2"` 在 CPU 环境会怎样?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 208 - 234, The
current _load_engine unconditionally passes
attn_implementation="flash_attention_2" and device_map=self.device into
Qwen3TTSForConditionalGeneration.from_pretrained which will fail on CPU; change
_load_engine to assemble a kwargs dict for from_pretrained: include
torch_dtype/dtype=torch.bfloat16 and attn_implementation="flash_attention_2" and
device_map=self.device only when self.device indicates CUDA (e.g. startswith
"cuda"), otherwise set torch_dtype/dtype to torch.float32 (or omit), set
device_map to "cpu" or None as appropriate, and skip the bfloat16
per-parameter/buffer conversion loops unless running on CUDA so the code paths
in _load_engine, Qwen3TTSForConditionalGeneration.from_pretrained,
attn_implementation, and device_map are conditional on self.device.

102-104: ⚠️ Potential issue | 🟠 Major

torch.load(..., weights_only=False) 仍然是 RCE 面,尤其 voice 路径可热切换时更危险喵
Line 103 与 Line 310 都显式 weights_only=False,这会把 .pt 当 pickle 反序列化喵。
如果 active_voice.json/配置允许指向任意 .pt,那等于把“加载任意代码执行”这个开关交给了文件内容喵。
建议优先改成 weights_only=True,并把确实需要反序列化的自定义类型全部注册到 torch.serialization.add_safe_globals(...) 喵。

建议修改(方向性示例,需要按当前 torch 版本验证)
-        payload = torch.load(pt_path, map_location=self.device, weights_only=False)
+        payload = torch.load(pt_path, map_location=self.device, weights_only=True)
-                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=False)
+                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=True)
#!/bin/bash
# 1) 盘点所有 torch.load 使用点与 weights_only 参数
rg -n "torch\\.load\\(" -S
rg -n "weights_only\\s*=" -S

# 2) 尝试定位仓库里对 torch/pytorch 版本的约束(不保证存在)
fd -HI "requirements.*|pyproject\\.toml|poetry\\.lock|pdm\\.lock|environment\\.yml|dockerfile|Dockerfile" . \
  -x rg -n "torch==|pytorch|cuda|flash_attn|transformers" {} -S

Also applies to: 308-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 102 - 104, The
torch.load call in _load_prompt_from_pt (and the other torch.load usage nearby)
currently uses weights_only=False which allows full pickle deserialization and
enables RCE when voice paths are user-switchable; change these torch.load calls
to use weights_only=True, and for any true custom objects that must be
deserialized register them explicitly with
torch.serialization.add_safe_globals(...) before loading; locate all torch.load
usages (including the one in _load_prompt_from_pt and the similar load around
the 308-313 area) and switch to weights_only=True, then add safe_globals
registrations for each custom class/type referenced by those .pt files and
validate active_voice.json/config only points to trusted files.
🧹 Nitpick comments (2)
local_server/qwen3_tts_server/local_server.py (2)

16-46: 模块 import 阶段就 basicConfig + 改 sys.path,副作用有点大喵
现在只要有人 import 这个模块,就会全局改 logging 配置、并把 MODEL_SOURCE_DIR 插进 sys.path 喵。
建议把 logging.basicConfig(...)sys.path 注入尽量挪到 main() / if __name__ == "__main__" 里,并在注入前显式检查目录存在与否,错误信息更可控喵。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 16 - 46, Move
side-effectful initialization out of module import: remove
logging.basicConfig(...) and the sys.path insertion from top-level module scope
and place them inside a main entry (e.g., under if __name__ == "__main__" or a
main() function). Before inserting MODEL_SOURCE_DIR into sys.path, check
os.path.isdir(MODEL_SOURCE_DIR) and only modify sys.path if the directory
exists; log an explicit error via logger and exit gracefully if it does not.
Keep the try/except import block (Qwen3TTSForConditionalGeneration,
Qwen3TTSProcessor, Qwen3TTSModel, VoiceClonePromptItem) but perform those
imports after configuring logging and sys.path in main to avoid global side
effects during import; ensure
torch.serialization.add_safe_globals([VoiceClonePromptItem]) still runs after
successful import.

536-573: 默认 host=127.0.0.1 很安全,但建议对非本地监听显式 warning/鉴权选项喵
Line 538 把默认 host 设为 127.0.0.1 这点很棒喵。
建议额外喵:当用户配置成 0.0.0.0 / 非本地地址时,启动时打一条显式 warning(提示“无鉴权,可能暴露到局域网/公网”),并考虑加一个可选的 token(header/query)校验喵。
Based on learnings: Review server binding and port exposure across the repository. Ensure Python servers bind to localhost/127.0.0.1 by default and that container/docker-compose port mappings do not expose sensitive services to the LAN unless explicitly intended. 喵

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 536 - 573, The
server currently defaults host to "127.0.0.1" but doesn't warn or enforce safety
when a non-local host is configured; before the async with websockets.serve(...)
block (after reading host/port and before creating QwenLocalServer/use of
server.handle_tts) add a check on the host value and log a clear warning if it's
not a loopback address (e.g., "0.0.0.0" or any non-local IP) indicating the
service may be exposed and unauthenticated, and add optional token-based auth
support by reading a config like tts_custom.get("auth_token") and wiring a
simple header/query token validation into the request handling path
(QwenLocalServer.handle_tts) so that when auth_token is set the server rejects
requests without the correct token; keep defaults unchanged (localhost no
warning, no auth).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 267-281: The reviewer flagged Ruff errors E701 (multiple
statements on one line), E722/S112 (bare except), etc.; update the _worker_loop
and related functions (_do_inference usage, task_queue handling, cancel_event
checks, and any other blocks between lines ~419-509) to eliminate one-line
multiple statements by splitting them onto separate lines and replace any bare
"except:" with "except Exception as e:" and log or handle the exception (e.g.,
use logger.exception or logger.error with e). Run ruff --fix (or apply the same
manual fixes) to ensure all occurrences (including task_queue.task_done() calls
and other multi-statement lines) conform to style and CI requirements.
- Around line 178-207: The prompt_cache currently keys only by path so
_ensure_active_prompt_loaded may return a stale prompt after the file at
desired_path is overwritten; change the cache to account for mtime (either use a
composite key like (desired_path, mtime) or store a tuple value (prompt, mtime)
in prompt_cache) and update the lookup logic in _ensure_active_prompt_loaded to
compare the stored mtime with the newly read mtime; if mtimes differ, call
_load_prompt_from_pt(desired_path), replace the cached entry, and then update
cached_prompt, active_voice_mtime, active_voice_path, pt_path and increment
voice_version as before.

In `@main_logic/tts_client.py`:
- Around line 1808-1868: The code uses bare except and many try/except: pass
patterns causing E722/S110/TRY201; update connect(), close_connection(),
send_cancel(), send_append(), and send_commit() to catch explicit Exception as e
(not bare except), replace logger.error(... + raise e) with logger.error(...);
raise (bare) to preserve traceback where reconnect fails, and inside the
lightweight handlers (previously except: pass) call logger.debug with the
exception (e.g., logger.debug("ws send failed", exc_info=e)) instead of
swallowing errors so lint rules are satisfied and failures are still visible;
ensure JSON sends that used ensure_ascii=False remain unchanged.

---

Outside diff comments:
In `@main_logic/tts_client.py`:
- Around line 208-244: The timeout log is inconsistent:
asyncio.wait_for(response_done.wait(), timeout=20.0) uses 20 seconds but the
warning message says “30秒”; update the code so the timeout value and log message
match by either extracting the timeout into a single constant (e.g.,
TIMEOUT_SECONDS) used by the wait_for call and in the logger.warning, or change
the warning text to “20秒”; modify the symbols response_done.wait(), timeout=20.0
and the logger.warning(...) to use the same value.

---

Duplicate comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 414-446: _sender_loop reads shared current_job_id while waiting on
audio_queue which creates a race; change the queue protocol so audio_queue
carries tuples (job_id, chunk) and treat the end marker as (job_id, "__END__")
so the sender uses the job_id from the dequeued tuple instead of the external
current_job_id. Update places that put data into audio_queue (where chunks are
enqueued and in _stop_current_job when signaling end) to enqueue
(current_job_id, chunk) or (current_job_id, "__END__") and modify _sender_loop
to unpack job_id, chunk = await audio_queue.get(), send chunk and, when chunk ==
"__END__", build response_done using the dequeued job_id rather than reading
current_job_id.
- Around line 208-234: The current _load_engine unconditionally passes
attn_implementation="flash_attention_2" and device_map=self.device into
Qwen3TTSForConditionalGeneration.from_pretrained which will fail on CPU; change
_load_engine to assemble a kwargs dict for from_pretrained: include
torch_dtype/dtype=torch.bfloat16 and attn_implementation="flash_attention_2" and
device_map=self.device only when self.device indicates CUDA (e.g. startswith
"cuda"), otherwise set torch_dtype/dtype to torch.float32 (or omit), set
device_map to "cpu" or None as appropriate, and skip the bfloat16
per-parameter/buffer conversion loops unless running on CUDA so the code paths
in _load_engine, Qwen3TTSForConditionalGeneration.from_pretrained,
attn_implementation, and device_map are conditional on self.device.
- Around line 102-104: The torch.load call in _load_prompt_from_pt (and the
other torch.load usage nearby) currently uses weights_only=False which allows
full pickle deserialization and enables RCE when voice paths are
user-switchable; change these torch.load calls to use weights_only=True, and for
any true custom objects that must be deserialized register them explicitly with
torch.serialization.add_safe_globals(...) before loading; locate all torch.load
usages (including the one in _load_prompt_from_pt and the similar load around
the 308-313 area) and switch to weights_only=True, then add safe_globals
registrations for each custom class/type referenced by those .pt files and
validate active_voice.json/config only points to trusted files.

In `@main_logic/tts_client.py`:
- Around line 1775-1847: The async_worker lacks a finally cleanup and
close_connection is misnamed/misimplemented, so add a try/finally around
async_worker's main logic to ensure receive_task is cancelled/awaited and ws is
closed (use receive_task.cancel() then await it with suppressions, and call
await ws.close() if ws is not None), and either remove close_connection or
refactor it into a true close_then_reconnect that first cancels/awaits
receive_task and closes the existing ws before creating a new connection via
websockets.connect; update callers to use connect() for initial open and the new
close_then_reconnect (or delete unused close_connection) so there is no resource
leak.
- Around line 1734-1759: The function local_qwen3_tts_worker currently ignores
the voice_id parameter; either use it or explicitly mark it unused—preferably
map voice_id to the local voice file and send it to the TTS server during
handshake/session update: add a small mapping or resolution from voice_id ->
voice_pt_path (e.g., switch/dict) and include that resolved path (or a voice
name) in the initial handshake or session.update call to the server (referencing
local_qwen3_tts_worker, voice_id, voice_pt_path, and session.update/handshake)
so callers can select voices; alternatively, if you choose not to support
selection, replace the unused param with a deliberate noop (_ = voice_id) and
update the function docstring to state that voice selection is
server-controlled.
- Around line 1887-1905: 在处理 sid is None 分支中,当前只是调用 send_append/send_commit 后立即把
current_speech_id 置空并继续,导致下一轮语音可能与上一轮未完全完成的音频混音;请修改 main_logic/tts_client.py
中该分支(涉及
sid、current_speech_id、send_append、send_commit、send_cancel、resampler、soxr.ResampleStream、src_rate、DST_RATE)以确保在
flush 后要么等待服务端明确的“完成/done”信号再将 current_speech_id 置空并继续,要么在不支持等待的情况下改为 flush
后立即关闭并重建连接并重置 resampler(用 new soxr.ResampleStream(src_rate, DST_RATE, 1,
dtype="float32"))以避免尾音丢失或串音。
- Around line 1501-1514: The current logic in get_config_manager()/tts_config
inspects tts_custom.base_url and silently maps http(s)→gptsovits_tts_worker and
ws(s)→local_qwen3_tts_worker while the local_cosyvoice_worker branch is
commented out; update this by either (A) explicitly logging a warning when
base_url startswith "ws://" or "wss://" that "ws/wss will be interpreted as
Qwen3 protocol" and keep returning local_qwen3_tts_worker, or (B) add a clear
selector field tts_config.get('api_type') (e.g., "qwen3" | "gptsovits" |
"cosyvoice") and use that to choose between gptsovits_tts_worker,
local_qwen3_tts_worker, or local_cosyvoice_worker respectively (falling back to
a safe default and logging the chosen mapping); ensure you reference tts_config,
base_url, gptsovits_tts_worker, local_qwen3_tts_worker and
local_cosyvoice_worker and emit a log.warn when falling back so users aren’t
surprised.

---

Nitpick comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 16-46: Move side-effectful initialization out of module import:
remove logging.basicConfig(...) and the sys.path insertion from top-level module
scope and place them inside a main entry (e.g., under if __name__ == "__main__"
or a main() function). Before inserting MODEL_SOURCE_DIR into sys.path, check
os.path.isdir(MODEL_SOURCE_DIR) and only modify sys.path if the directory
exists; log an explicit error via logger and exit gracefully if it does not.
Keep the try/except import block (Qwen3TTSForConditionalGeneration,
Qwen3TTSProcessor, Qwen3TTSModel, VoiceClonePromptItem) but perform those
imports after configuring logging and sys.path in main to avoid global side
effects during import; ensure
torch.serialization.add_safe_globals([VoiceClonePromptItem]) still runs after
successful import.
- Around line 536-573: The server currently defaults host to "127.0.0.1" but
doesn't warn or enforce safety when a non-local host is configured; before the
async with websockets.serve(...) block (after reading host/port and before
creating QwenLocalServer/use of server.handle_tts) add a check on the host value
and log a clear warning if it's not a loopback address (e.g., "0.0.0.0" or any
non-local IP) indicating the service may be exposed and unauthenticated, and add
optional token-based auth support by reading a config like
tts_custom.get("auth_token") and wiring a simple header/query token validation
into the request handling path (QwenLocalServer.handle_tts) so that when
auth_token is set the server rejects requests without the correct token; keep
defaults unchanged (localhost no warning, no auth).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb4963d and f077cae.

📒 Files selected for processing (2)
  • local_server/qwen3_tts_server/local_server.py
  • main_logic/tts_client.py

Comment thread local_server/qwen3_tts_server/local_server.py
Comment thread local_server/qwen3_tts_server/local_server.py
Comment thread main_logic/tts_client.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
local_server/qwen3_tts_server/local_server.py (4)

444-456: _sender_loop 读取 current_job_id 存在竞态,response.done 可能丢失喵

这个问题之前也提过但没修!__END__ 被推入队列后、_sender_loop 处理之前,_stop_current_job() 可能已经把 current_job_id 置为 None,导致 response.done 无法发出喵。

建议在 await audio_queue.get() 拿到 __END__ 的瞬间立刻快照 job_id

chunk = await audio_queue.get()
snapshot_job_id = current_job_id   # 立刻快照
if chunk == b"__END__":
    if snapshot_job_id:
        await websocket.send(json.dumps({"type": "response.done", "job_id": snapshot_job_id}))
    continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 444 - 456,
_sender_loop can miss sending the response.done when __END__ is dequeued because
current_job_id may be cleared by _stop_current_job between await
audio_queue.get() and the check; fix by snapshotting current_job_id immediately
after receiving chunk from audio_queue (in _sender_loop) into a local variable
(e.g., snapshot_job_id) and use that snapshot when forming/sending the
{"type":"response.done","job_id":...} over websocket so the job_id remains
stable even if _stop_current_job clears the shared current_job_id concurrently.

222-229: ⚠️ Potential issue | 🟠 Major

attn_implementation="flash_attention_2" 在 CPU 环境下会直接报错喵

上次评审已经提到这个问题了,device_map 修好了,但 attn_implementation="flash_attention_2" 还是硬编码,CPU 上根本不支持 Flash Attention 2,模型加载会直接崩喵!

🐾 修复建议
             raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
                 model_path,
                 torch_dtype=torch.bfloat16,
                 dtype=torch.bfloat16,
-                attn_implementation="flash_attention_2",
+                attn_implementation="flash_attention_2" if torch.cuda.is_available() else "eager",
                 device_map=self.device,
                 low_cpu_mem_usage=True
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 222 - 229, The
call to Qwen3TTSForConditionalGeneration.from_pretrained hardcodes
attn_implementation="flash_attention_2", which crashes on CPU; change it to set
attn_implementation only when running on a GPU that supports flash attention
(e.g., check self.device or torch.cuda.is_available()), otherwise omit the
parameter or pass None/leave default so CPU loads normally; update the call site
around model_path and device_map to compute attn_impl based on the environment
and pass that variable into from_pretrained instead of the hardcoded
"flash_attention_2".

102-103: ⚠️ Potential issue | 🟠 Major

torch.load(..., weights_only=False) 的安全风险还没修喵(老问题又来了)

_load_prompt_from_pt(Line 103)和 _do_inference(Line 318)两处都还在用 weights_only=False,允许 pickle 反序列化执行任意代码喵。上次评审里的修复建议还没落实呢!

Line 41 的 add_safe_globals([VoiceClonePromptItem]) 只注册了一个类,不足以完全消除风险喵。建议将两处都改为 weights_only=True,同时通过 add_safe_globals 把所有需要的自定义类型注册进去喵。

Also applies to: 316-321

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 102 - 103, The
code is insecure because torch.load is called with weights_only=False in
_load_prompt_from_pt and _do_inference; change both calls to torch.load(...,
weights_only=True) and register every custom type used in the saved payload with
add_safe_globals (not just VoiceClonePromptItem) so the safe deserializer can
reconstruct objects; locate _load_prompt_from_pt and _do_inference and update
the torch.load call signatures and extend the add_safe_globals invocation to
include all custom classes/structs used by the prompts (e.g.,
VoiceClonePromptItem plus any other prompt item classes, enums or dataclasses
referenced).

473-473: E701:一行多语句,Ruff error 级别,可能卡 CI 喵

Line 473 和 Line 497 的单行 if ... : statement 是 Ruff error(E701),之前已经提过还没修喵!

🧹 修复示意
-                if "text" in data and not msg_type: msg_type = "legacy.text"
+                if "text" in data and not msg_type:
+                    msg_type = "legacy.text"
-                            if not sentence: continue
+                            if not sentence:
+                                continue

Also applies to: 497-497

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` at line 473, The single-line
if statement "if 'text' in data and not msg_type: msg_type = 'legacy.text'" (and
the similar occurrence around the same block) triggers Ruff E701; replace the
one-line form with a normal multi-line if block so it's two statements: an if
header and a separate indented assignment inside, e.g., use "if 'text' in data
and not msg_type:" followed on the next line by "msg_type = 'legacy.text'";
update both occurrences that reference msg_type and data accordingly.
main_logic/tts_client.py (1)

1735-1736: voice_id 传进来就被忽略了喵(老问题还没修呢)

这个问题之前就提过,voice_id 参数至今仍完全没用上。即使 session.update 消息里加了 voice_pt_path,服务端的 handle_tts 也不解析该字段,所以音色选择路径对客户端来说完全是死路喵。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1735 - 1736, The worker function
local_qwen3_tts_worker currently ignores the voice_id parameter and the TTS
pipeline also doesn't parse voice_pt_path from session.update, so voice
selection is never applied; fix by wiring voice selection end-to-end: make
local_qwen3_tts_worker accept and use voice_id (or mapped voice file path) when
constructing/initializing the local TTS engine, update the server-side
handle_tts to parse voice_pt_path from incoming session.update messages and pass
that value (or resolve it to a voice_id) into the call that enqueues work for
local_qwen3_tts_worker, and ensure the queue payload keys match (e.g. include
"voice_pt_path" or "voice_id") so the worker uses that value when generating
audio. Ensure function names to change include local_qwen3_tts_worker and
handle_tts and that session.update handling reads voice_pt_path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main_logic/tts_client.py`:
- Around line 1776-1783: The worker is missing the response_done_event used by
receive_loop and the shutdown logic; inside async_worker() initialize
response_done_event = asyncio.Event() alongside ws, receive_task,
current_speech_id, text_buf, etc., so receive_loop and the termination branches
can await/set/clear it without raising NameError; locate async_worker, add the
Event initialization (response_done_event) at the top of the function and ensure
existing uses in receive_loop and the termination/cleanup code reference this
variable.

---

Duplicate comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 444-456: _sender_loop can miss sending the response.done when
__END__ is dequeued because current_job_id may be cleared by _stop_current_job
between await audio_queue.get() and the check; fix by snapshotting
current_job_id immediately after receiving chunk from audio_queue (in
_sender_loop) into a local variable (e.g., snapshot_job_id) and use that
snapshot when forming/sending the {"type":"response.done","job_id":...} over
websocket so the job_id remains stable even if _stop_current_job clears the
shared current_job_id concurrently.
- Around line 222-229: The call to
Qwen3TTSForConditionalGeneration.from_pretrained hardcodes
attn_implementation="flash_attention_2", which crashes on CPU; change it to set
attn_implementation only when running on a GPU that supports flash attention
(e.g., check self.device or torch.cuda.is_available()), otherwise omit the
parameter or pass None/leave default so CPU loads normally; update the call site
around model_path and device_map to compute attn_impl based on the environment
and pass that variable into from_pretrained instead of the hardcoded
"flash_attention_2".
- Around line 102-103: The code is insecure because torch.load is called with
weights_only=False in _load_prompt_from_pt and _do_inference; change both calls
to torch.load(..., weights_only=True) and register every custom type used in the
saved payload with add_safe_globals (not just VoiceClonePromptItem) so the safe
deserializer can reconstruct objects; locate _load_prompt_from_pt and
_do_inference and update the torch.load call signatures and extend the
add_safe_globals invocation to include all custom classes/structs used by the
prompts (e.g., VoiceClonePromptItem plus any other prompt item classes, enums or
dataclasses referenced).
- Line 473: The single-line if statement "if 'text' in data and not msg_type:
msg_type = 'legacy.text'" (and the similar occurrence around the same block)
triggers Ruff E701; replace the one-line form with a normal multi-line if block
so it's two statements: an if header and a separate indented assignment inside,
e.g., use "if 'text' in data and not msg_type:" followed on the next line by
"msg_type = 'legacy.text'"; update both occurrences that reference msg_type and
data accordingly.

In `@main_logic/tts_client.py`:
- Around line 1735-1736: The worker function local_qwen3_tts_worker currently
ignores the voice_id parameter and the TTS pipeline also doesn't parse
voice_pt_path from session.update, so voice selection is never applied; fix by
wiring voice selection end-to-end: make local_qwen3_tts_worker accept and use
voice_id (or mapped voice file path) when constructing/initializing the local
TTS engine, update the server-side handle_tts to parse voice_pt_path from
incoming session.update messages and pass that value (or resolve it to a
voice_id) into the call that enqueues work for local_qwen3_tts_worker, and
ensure the queue payload keys match (e.g. include "voice_pt_path" or "voice_id")
so the worker uses that value when generating audio. Ensure function names to
change include local_qwen3_tts_worker and handle_tts and that session.update
handling reads voice_pt_path.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f077cae and 4409a7e.

📒 Files selected for processing (2)
  • local_server/qwen3_tts_server/local_server.py
  • main_logic/tts_client.py

Comment thread main_logic/tts_client.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
main_logic/tts_client.py (1)

1735-1736: ⚠️ Potential issue | 🟠 Major

voice_id 至今仍未使用(ARG001),audio_api_key 有占位但 voice_id 没有喵

Line 1736 给 audio_api_key 做了 _ = audio_api_key 的占位抑制,但 voice_id 既没有占位也没有实际使用。整个函数里 voice_pt_path 始终硬编码为 nyaning_voice.pt,与传入的 voice_id 完全无关喵。

至少应该加一行 _ = voice_id 占位,或者按注释里说的把 voice_id 映射到 voice_pt_path 的选择逻辑实现了喵。

💡 最小修复(抑制 ARG001 并与 audio_api_key 保持一致)
 def local_qwen3_tts_worker(request_queue, response_queue, audio_api_key, voice_id):
     _ = audio_api_key  # 本地不需要 key
+    _ = voice_id       # TODO: 映射到不同的 voice_pt_path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1735 - 1736, In local_qwen3_tts_worker
add handling for the unused voice_id: either suppress the linter by adding a
placeholder reference (e.g., _ = voice_id) like audio_api_key is suppressed, or
implement mapping logic that uses voice_id to set voice_pt_path instead of the
hardcoded "nyaning_voice.pt" (update the assignment of voice_pt_path within
local_qwen3_tts_worker to select the correct .pt filename based on voice_id).
🧹 Nitpick comments (1)
main_logic/tts_client.py (1)

1786-1812: receive_loop 缺少对 ConnectionClosedCancelledError 的独立处理,正常重连也会刷错误日志喵

speech_id 变化触发 reconnect() 时,_cleanup_ws() 会 cancel receive_task,这时 receive_loop 会收到 asyncio.CancelledError,被现在的 except Exception 吞掉并打出一条 error 日志。同样地,WebSocket 正常关闭会抛 websockets.exceptions.ConnectionClosed,也会被当作错误记录喵。

对比同文件里 local_cosyvoice_worker(Line 1613-1618)、gptsovits_tts_worker(Line 1326-1331)的 receive_loop,都会把这两种情况静默处理喵。websockets 官方文档也明确建议捕获 ConnectionClosed 来防止它被日志记录喵。

♻️ 建议修复(对齐其他 worker 的模式)
         async def receive_loop(ws_conn):
             nonlocal src_rate, resampler
             try:
                 async for message in ws_conn:
                     if isinstance(message, str):
                         try:
                             evt = json.loads(message)
                         except Exception as e:
                             logger.debug(f"local_qwen3 parse message error:{e}")
                             continue
                         if evt.get("type") == "response.done":
                             response_done_event.set()
                             continue
                         if evt.get("type") == "audio.meta":
                             sr = int(evt.get("sample_rate", src_rate))
                             if sr != src_rate:
                                 src_rate = sr
                                 resampler = soxr.ResampleStream(src_rate, DST_RATE, 1, dtype="float32")
                         continue
 
                     if isinstance(message, bytes):
                         audio_i16 = np.frombuffer(message, dtype=np.int16)
                         resampled = _resample_audio(audio_i16, src_rate, DST_RATE, resampler)
                         response_queue.put(resampled)
+            except websockets.exceptions.ConnectionClosed:
+                logger.debug("local_qwen3 receive_loop: connection closed")
+            except asyncio.CancelledError:
+                pass
             except Exception as e:
                 logger.error(f"local_qwen3 receive_loop error: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1786 - 1812, The receive_loop
currently catches all exceptions and logs them as errors, which incorrectly
treats normal websocket closure and task cancellation as errors; update
receive_loop to explicitly catch and silently handle asyncio.CancelledError and
websockets.exceptions.ConnectionClosed (or ConnectionClosedError)—matching the
pattern used in local_cosyvoice_worker and gptsovits_tts_worker—so that when
_cleanup_ws cancels receive_task during reconnect() or the socket closes
normally no error is logged, and only unexpected exceptions fall through to
logger.error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main_logic/tts_client.py`:
- Line 1600: The new variable response_done_event declared inside async_worker
of local_cosyvoice_worker is never used (no .set()/.wait()/.clear() calls) and
appears to be leftover from copying local_qwen3_tts_worker; remove the unused
response_done_event declaration to clean up the code and avoid the F841 lint
warning.
- Around line 1757-1759: The path assigned to voice_pt_path is left as a
relative/non-normalized path (it contains "..") because the os.path.abspath call
was commented out; restore normalization by passing voice_pt_path through
os.path.abspath before sending it to the server (e.g., re-enable the
os.path.abspath(voice_pt_path) call) so voice_pt_path becomes an absolute,
canonical path; update any code that uses voice_pt_path (the variable named
voice_pt_path in this module) to use the normalized value.

---

Duplicate comments:
In `@main_logic/tts_client.py`:
- Around line 1735-1736: In local_qwen3_tts_worker add handling for the unused
voice_id: either suppress the linter by adding a placeholder reference (e.g., _
= voice_id) like audio_api_key is suppressed, or implement mapping logic that
uses voice_id to set voice_pt_path instead of the hardcoded "nyaning_voice.pt"
(update the assignment of voice_pt_path within local_qwen3_tts_worker to select
the correct .pt filename based on voice_id).

---

Nitpick comments:
In `@main_logic/tts_client.py`:
- Around line 1786-1812: The receive_loop currently catches all exceptions and
logs them as errors, which incorrectly treats normal websocket closure and task
cancellation as errors; update receive_loop to explicitly catch and silently
handle asyncio.CancelledError and websockets.exceptions.ConnectionClosed (or
ConnectionClosedError)—matching the pattern used in local_cosyvoice_worker and
gptsovits_tts_worker—so that when _cleanup_ws cancels receive_task during
reconnect() or the socket closes normally no error is logged, and only
unexpected exceptions fall through to logger.error.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4409a7e and 5a24d0f.

📒 Files selected for processing (1)
  • main_logic/tts_client.py

Comment thread main_logic/tts_client.py
receive_task = None
current_speech_id = None

response_done_event = asyncio.Event()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

response_done_event 赋值后从未使用喵(F841)

这行是新加进来的,但在整个 local_cosyvoice_workerasync_worker 里找不到任何 .set().wait().clear() 的调用喵。是不是复制 local_qwen3_tts_worker 时顺手带进来的残留代码?直接删掉喵~

🧹 建议清理
     async def async_worker():
         ws = None
         receive_task = None
         current_speech_id = None
-        response_done_event = asyncio.Event()
 
         resampler = soxr.ResampleStream(SRC_RATE, 48000, 1, dtype='float32')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response_done_event = asyncio.Event()
async def async_worker():
ws = None
receive_task = None
current_speech_id = None
resampler = soxr.ResampleStream(SRC_RATE, 48000, 1, dtype='float32')
🧰 Tools
🪛 Ruff (0.15.2)

[error] 1600-1600: Local variable response_done_event is assigned to but never used

Remove assignment to unused variable response_done_event

(F841)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` at line 1600, The new variable response_done_event
declared inside async_worker of local_cosyvoice_worker is never used (no
.set()/.wait()/.clear() calls) and appears to be leftover from copying
local_qwen3_tts_worker; remove the unused response_done_event declaration to
clean up the code and avoid the F841 lint warning.

Comment thread main_logic/tts_client.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (6)
local_server/qwen3_tts_server/local_server.py (4)

102-104: ⚠️ Potential issue | 🔴 Critical

torch.load 使用 weights_only=False 仍有反序列化风险
这两处加载 .pt 仍走 pickle,若文件非完全可信会有 RCE 风险。建议改为 weights_only=True 并继续使用 add_safe_globals 白名单喵。

🔐 建议修改
-        payload = torch.load(pt_path, map_location=self.device, weights_only=False)
+        payload = torch.load(pt_path, map_location=self.device, weights_only=True)
-                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=False)
+                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=True)
PyTorch torch.load weights_only 安全性与默认行为(近版本)

Also applies to: 316-319

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 102 - 104, The
torch.load call in _load_prompt_from_pt currently uses weights_only=False which
still performs pickle deserialization; change it to weights_only=True to avoid
executing untrusted code and apply the existing safe global whitelist via
add_safe_globals when reconstructing any needed objects; update the
corresponding loader at the other location (the load block referenced around
lines 316-319) to mirror this change, handle the case where weights-only load
omits non-tensor metadata by explicitly reconstructing or validating expected
fields after load, and wrap the load operations in a try/except that logs and
raises a clear error if the file cannot be safely loaded.

444-453: ⚠️ Potential issue | 🟠 Major

response.done 的 job_id 可能串线
_sender_loop 在 await 后读取外层 current_job_id,在 cancel/切换 job 时可能发送错误的 job_id。建议把 job_id 随 chunk 一起入队,发送时用入队时的快照喵。

🧷 建议修改
-                    chunk = await audio_queue.get()
-                    if chunk == b"__END__":
-                        if current_job_id:
+                    job_id, chunk = await audio_queue.get()
+                    if chunk == b"__END__":
+                        if job_id:
                             # 适配 N.E.K.O 客户端协议
-                            response_done = {"type": "response.done", "job_id": current_job_id}
+                            response_done = {"type": "response.done", "job_id": job_id}
                             await websocket.send(json.dumps(response_done))
                         continue
                     await websocket.send(chunk)
-                loop.call_soon_threadsafe(audio_queue.put_nowait, audio_int16.tobytes())
+                loop.call_soon_threadsafe(audio_queue.put_nowait, (job_id, audio_int16.tobytes()))
...
-            loop.call_soon_threadsafe(audio_queue.put_nowait, b"__END__")
+            loop.call_soon_threadsafe(audio_queue.put_nowait, (job_id, b"__END__"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 444 - 453,
_sender_loop reads the outer current_job_id after an await which can race when
jobs are cancelled/switched; fix by enqueuing a snapshot of the job id with each
chunk (i.e. change audio_queue items to (job_id, chunk) where producers push
(current_job_id, chunk) including the "__END__" marker) and update _sender_loop
to unpack (job_id, chunk) and use that job_id for sending response_done and all
sends so the job_id is consistent per-chunk.

464-473: ⚠️ Potential issue | 🟡 Minor

Ruff E701/S112 可能卡 CI:拆行并记录解析失败
if "text" in data and not msg_type: ... 是 E701,同时 JSON 解析异常被静默吞掉触发 S112。建议拆行并做 debug 记录喵。

🧹 建议修改
-                try:
-                    data = json.loads(message)
-                except Exception:
-                    continue
+                try:
+                    data = json.loads(message)
+                except json.JSONDecodeError as e:
+                    logger.debug(f"invalid json from client: {e}")
+                    continue
...
-                if "text" in data and not msg_type: msg_type = "legacy.text"
+                if "text" in data and not msg_type:
+                    msg_type = "legacy.text"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 464 - 473, The
single-line conditional `if "text" in data and not msg_type: msg_type =
"legacy.text"` causes E701 and the bare except around json.loads(message)
silently swallows errors (S112); update the websocket message loop (the `async
for message in websocket` block) to split the conditional into two lines (check
`"text" in data` then `if not msg_type` set `msg_type = "legacy.text"`) and
change the `except Exception:` that wraps `data = json.loads(message)` to log
the exception (e.g., using the module logger or websocket logger with
exception/debug level and include the exception info) before continuing so parse
failures are recorded.

221-228: ⚠️ Potential issue | 🟠 Major

flash_attention_2 在 CPU 上可能直接失败
当前无条件设置 attn_implementation="flash_attention_2",CPU 环境下通常不可用,可能导致启动失败。建议仅在 CUDA 上启用喵。

🧯 建议修改
-            raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
-                model_path,
-                torch_dtype=torch.bfloat16,  # 给 transformers 看
-                dtype=torch.bfloat16,  # 给 qwen 看
-                attn_implementation="flash_attention_2",
-                device_map=self.device,  # 旧的是cuda 硬编码
-                low_cpu_mem_usage=True
-            )
+            model_kwargs = dict(
+                torch_dtype=torch.bfloat16,  # 给 transformers 看
+                dtype=torch.bfloat16,  # 给 qwen 看
+                device_map=self.device,
+                low_cpu_mem_usage=True,
+            )
+            if self.device == "cuda":
+                model_kwargs["attn_implementation"] = "flash_attention_2"
+            raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(model_path, **model_kwargs)
Transformers attn_implementation="flash_attention_2" 是否支持 CPU
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 221 - 228, The
code unconditionally passes attn_implementation="flash_attention_2" into
Qwen3TTSForConditionalGeneration.from_pretrained which can fail on CPU; change
the call so attn_implementation is set only when running on CUDA (e.g., check
torch.cuda.is_available() or that self.device contains "cuda") and omit or use
the default attn_implementation for CPU. Locate the from_pretrained call around
Qwen3TTSForConditionalGeneration.from_pretrained(model_path, torch_dtype=...,
dtype=..., attn_implementation=..., device_map=self.device, ...) and make
attn_implementation conditional based on the device check.
main_logic/tts_client.py (2)

1735-1758: ⚠️ Potential issue | 🟠 Major

local_qwen3_tts_worker 忽略 voice_id,调用方无法选音色
voice_id 完全未使用,导致调用方传入的音色无效且触发 ARG001。建议至少将 voice_id 映射到 voice_pt_path(或显式标注不支持)喵。

🐾 一个最小可用的处理示例
-    voice_pt_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "local_server", "qwen3_tts_server", "nyaning_voice.pt"))
+    voice_pt_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "local_server", "qwen3_tts_server", "nyaning_voice.pt"))
+    if voice_id and os.path.exists(voice_id):
+        voice_pt_path = os.path.abspath(voice_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1735 - 1758, local_qwen3_tts_worker
currently ignores the voice_id parameter which causes caller-supplied voice
selection to be lost and triggers ARG001; update the function to either map
voice_id to the correct local model file (e.g. select different paths for
voice_pt_path based on voice_id) or explicitly validate and raise/log a clear
unsupported-voice error; modify the code paths that set voice_pt_path (and any
downstream usage) to use this mapping/validation so callers can choose voice,
and ensure you reference local_qwen3_tts_worker, voice_id, and voice_pt_path
when implementing the change.

1596-1602: ⚠️ Potential issue | 🟡 Minor

删除未使用的 response_done_event 变量
Line 1600 声明后未被 set/wait/clear,属于死代码且会触发 F841。直接移除更干净喵。

🧹 建议修改
-        response_done_event = asyncio.Event()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1596 - 1602, The variable
response_done_event declared inside async_worker is never used (no
set/wait/clear) and triggers an unused-variable warning; remove the
response_done_event = asyncio.Event() line from the async_worker function (and
any related unused references if they exist) so the function no longer defines
an unused asyncio.Event object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 102-104: The torch.load call in _load_prompt_from_pt currently
uses weights_only=False which still performs pickle deserialization; change it
to weights_only=True to avoid executing untrusted code and apply the existing
safe global whitelist via add_safe_globals when reconstructing any needed
objects; update the corresponding loader at the other location (the load block
referenced around lines 316-319) to mirror this change, handle the case where
weights-only load omits non-tensor metadata by explicitly reconstructing or
validating expected fields after load, and wrap the load operations in a
try/except that logs and raises a clear error if the file cannot be safely
loaded.
- Around line 444-453: _sender_loop reads the outer current_job_id after an
await which can race when jobs are cancelled/switched; fix by enqueuing a
snapshot of the job id with each chunk (i.e. change audio_queue items to
(job_id, chunk) where producers push (current_job_id, chunk) including the
"__END__" marker) and update _sender_loop to unpack (job_id, chunk) and use that
job_id for sending response_done and all sends so the job_id is consistent
per-chunk.
- Around line 464-473: The single-line conditional `if "text" in data and not
msg_type: msg_type = "legacy.text"` causes E701 and the bare except around
json.loads(message) silently swallows errors (S112); update the websocket
message loop (the `async for message in websocket` block) to split the
conditional into two lines (check `"text" in data` then `if not msg_type` set
`msg_type = "legacy.text"`) and change the `except Exception:` that wraps `data
= json.loads(message)` to log the exception (e.g., using the module logger or
websocket logger with exception/debug level and include the exception info)
before continuing so parse failures are recorded.
- Around line 221-228: The code unconditionally passes
attn_implementation="flash_attention_2" into
Qwen3TTSForConditionalGeneration.from_pretrained which can fail on CPU; change
the call so attn_implementation is set only when running on CUDA (e.g., check
torch.cuda.is_available() or that self.device contains "cuda") and omit or use
the default attn_implementation for CPU. Locate the from_pretrained call around
Qwen3TTSForConditionalGeneration.from_pretrained(model_path, torch_dtype=...,
dtype=..., attn_implementation=..., device_map=self.device, ...) and make
attn_implementation conditional based on the device check.

In `@main_logic/tts_client.py`:
- Around line 1735-1758: local_qwen3_tts_worker currently ignores the voice_id
parameter which causes caller-supplied voice selection to be lost and triggers
ARG001; update the function to either map voice_id to the correct local model
file (e.g. select different paths for voice_pt_path based on voice_id) or
explicitly validate and raise/log a clear unsupported-voice error; modify the
code paths that set voice_pt_path (and any downstream usage) to use this
mapping/validation so callers can choose voice, and ensure you reference
local_qwen3_tts_worker, voice_id, and voice_pt_path when implementing the
change.
- Around line 1596-1602: The variable response_done_event declared inside
async_worker is never used (no set/wait/clear) and triggers an unused-variable
warning; remove the response_done_event = asyncio.Event() line from the
async_worker function (and any related unused references if they exist) so the
function no longer defines an unused asyncio.Event object.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a24d0f and 08fd718.

📒 Files selected for processing (2)
  • local_server/qwen3_tts_server/local_server.py
  • main_logic/tts_client.py

@rophec
Copy link
Copy Markdown
Contributor Author

rophec commented Feb 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
main_logic/tts_client.py (2)

1667-1667: ⚠️ Potential issue | 🟡 Minor

response_done_event 在这里是未使用变量,会触发 F841 喵

Line [1667] 声明后没有任何 .set/.wait/.clear 调用,建议直接删除,避免 lint 报错喵。

🧹 建议修复
-        response_done_event = asyncio.Event()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` at line 1667, The variable response_done_event
declared at the top-level of the tts client (response_done_event =
asyncio.Event()) is unused and triggers lint F841; remove that unused
declaration from main_logic/tts_client.py (delete the response_done_event =
asyncio.Event() line) and any redundant imports if they become unused as a
result so no .set/.wait/.clear calls remain missing.

1802-1803: ⚠️ Potential issue | 🟠 Major

voice_id 仍被忽略,调用方无法按请求切换音色喵

Line [1802] 接收了 voice_id,但 Line [1825] 固定写死 voice_pt_path,这会让上层传入音色参数失效喵。建议至少把 voice_id 映射到 voice_pt_path(或直接透传给服务端)喵。

🎛️ 建议修复
-    voice_pt_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "local_server", "qwen3_tts_server", "nyaning_voice.pt"))
+    default_pt = os.path.abspath(os.path.join(
+        os.path.dirname(__file__), "..", "local_server", "qwen3_tts_server", "nyaning_voice.pt"
+    ))
+    voice_pt_path = voice_id.strip() if voice_id and voice_id.strip() else default_pt

Also applies to: 1822-1826

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main_logic/tts_client.py` around lines 1802 - 1803, The function
local_qwen3_tts_worker currently ignores the voice_id parameter and always uses
a hardcoded voice_pt_path; update local_qwen3_tts_worker to honor voice_id by
either mapping voice_id to the correct voice_pt_path (e.g., via a local
dictionary or lookup function) or by including/forwarding voice_id in the
request payload sent to the local TTS service instead of the hardcoded path;
make the change where voice_pt_path is set and where the request payload/body is
assembled so callers can switch voices dynamically (referencing
local_qwen3_tts_worker and the voice_pt_path usage).
local_server/qwen3_tts_server/local_server.py (4)

109-109: ⚠️ Potential issue | 🟠 Major

torch.load(..., weights_only=False) 仍有反序列化执行风险喵

Line [109]、Line [329]、Line [498] 显式使用了 weights_only=False,只要 .pt 文件来源不完全可信,就有反序列化执行风险喵。建议统一改为 weights_only=True,并继续通过 add_safe_globals 维护必要白名单喵。

🔒 建议修复
-        payload = torch.load(pt_path, map_location=self.device, weights_only=False)
+        payload = torch.load(pt_path, map_location=self.device, weights_only=True)
...
-                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=False)
+                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=True)
...
-                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=False)
+                    prompt_snapshot = torch.load(prompt_snapshot, map_location=self.device, weights_only=True)

Also applies to: 329-330, 498-499

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` at line 109, Replace unsafe
torch.load calls that pass weights_only=False with weights_only=True to avoid
deserialization execution risks: update the torch.load invocations (e.g., the
assignment to payload at the torch.load(pt_path, map_location=self.device, ...)
locations and the other similar calls around the same file) to use
weights_only=True while preserving map_location=self.device; if the code relies
on executing pickled globals, continue to expose only vetted symbols via
add_safe_globals (maintain or extend the whitelist there) so loading still works
without enabling full deserialization.

475-477: ⚠️ Potential issue | 🟠 Major

0.6B 引擎加载失败后仍继续启动会导致“连上但无声”喵

Line [475]-Line [477] 仅记录错误但不退出,后续服务会继续接连接但推理路径直接空返回,排障会很痛苦喵。这里建议与 _load_engine 对齐:失败即中止启动喵。

🚨 建议修复
         except Exception as e:
             logger.error(f"❌ 0.6B 加载引擎异常: {e}")
+            sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 475 - 477, The
except block that logs "❌ 0.6B 加载引擎异常" must abort startup like _load_engine
does; update the exception handler around the 0.6B engine load so it not only
logs the error (logger.error(...)) but also stops the process (e.g., call
sys.exit(1) or re-raise the exception) to prevent the server from coming up in a
non-functional "connected but silent" state; mirror the failure behavior used in
_load_engine to ensure a hard abort on load failure.

228-235: ⚠️ Potential issue | 🟠 Major

CPU fallback 与 FlashAttention2 配置仍然冲突喵

Line [62] 明确支持 CPU fallback,但 Line [232] 和 Line [445] 仍无条件使用 attn_implementation="flash_attention_2",CPU 环境下容易直接加载失败喵。建议按设备条件设置 attention 实现与 dtype 喵。

🛠️ 建议修复
+            attn_impl = "flash_attention_2" if self.device == "cuda" else "eager"
+            load_dtype = torch.bfloat16 if self.device == "cuda" else torch.float32
             raw_model = Qwen3TTSForConditionalGeneration.from_pretrained(
                 model_path,
-                torch_dtype=torch.bfloat16,  # 给 transformers 看
-                dtype=torch.bfloat16,  # 给 qwen 看
-                attn_implementation="flash_attention_2",
+                torch_dtype=load_dtype,
+                dtype=load_dtype,
+                attn_implementation=attn_impl,
                 device_map=self.device,  # 旧的是cuda 硬编码
                 low_cpu_mem_usage=True
             )

Also applies to: 442-448

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 228 - 235, The
code unconditionally sets attn_implementation="flash_attention_2" and bfloat16
dtypes when calling Qwen3TTSForConditionalGeneration.from_pretrained, which will
fail on CPU fallback; change the call so it branches on the runtime device
(self.device / device_map) and only sets attn_implementation="flash_attention_2"
and torch.bfloat16/dtype=torch.bfloat16 when the target device is CUDA (GPU);
for CPU use attn_implementation=None (or omit the kwarg) and
torch_dtype/dtype=torch.float32 (or remove dtype args) to ensure CPU loads
succeed—update both occurrences around
Qwen3TTSForConditionalGeneration.from_pretrained and the similar call later so
the attn_implementation and dtype choices are conditional on the device.

428-428: ⚠️ Potential issue | 🟠 Major

__END__ 使用外部 current_job_id 会回传错误 job_id 喵

Line [611]-Line [614] 在 sender 协程里读取可变的 current_job_id,而 __END__ 是异步到达的,可能对应的是旧任务,导致 response.done 绑定错会话喵。建议把 job_id 一起入队,避免跨协程共享状态歧义喵。

Also applies to: 578-578, 606-615

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` at line 428, The end-marker
currently enqueues only b"__END__" and the sender coroutine (which reads
current_job_id) may bind response.done to the wrong job; change the protocol to
enqueue tuples including the job id (e.g., (job_id, b"__END__") and for all
audio chunks) so the consumer uses the job_id from the queue instead of the
shared current_job_id; update places that call
loop.call_soon_threadsafe(audio_queue.put_nowait, ...) and the sender coroutine
that reads from audio_queue to unpack (job_id, chunk) and only mark
response.done for that job_id (avoid relying on mutable current_job_id and
adjust handling in the code paths around audio_queue usage and response.done
accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Around line 634-684: Add a branch for msg_type == "session.update" inside
handle_tts that reads voice_pt_path and language from data and writes them to
the running session (e.g., self.voice_pt_path = data.get("voice_pt_path") and
self.language = data.get("language")), then force a prompt/version refresh by
calling self._ensure_active_prompt_loaded() and, if the returned current_version
!= last_voice_version, set last_voice_version = current_version and await
_stop_current_job(keep_buffer=True) to trigger the hot voice switch; also
consider sending a small response back over websocket (optional) to acknowledge
the update.
- Around line 515-517: You set original_dtype = torch.get_default_dtype() then
call torch.set_default_dtype(torch.bfloat16) but never restore it (and
original_dtype is unused), causing a linter F841 and a global dtype leak; update
the try/finally that currently ends with a bare pass to restore the original
dtype by calling torch.set_default_dtype(original_dtype) in the finally block
(and keep the original_dtype variable in scope), ensuring the default dtype is
reset even on exceptions.

---

Duplicate comments:
In `@local_server/qwen3_tts_server/local_server.py`:
- Line 109: Replace unsafe torch.load calls that pass weights_only=False with
weights_only=True to avoid deserialization execution risks: update the
torch.load invocations (e.g., the assignment to payload at the
torch.load(pt_path, map_location=self.device, ...) locations and the other
similar calls around the same file) to use weights_only=True while preserving
map_location=self.device; if the code relies on executing pickled globals,
continue to expose only vetted symbols via add_safe_globals (maintain or extend
the whitelist there) so loading still works without enabling full
deserialization.
- Around line 475-477: The except block that logs "❌ 0.6B 加载引擎异常" must abort
startup like _load_engine does; update the exception handler around the 0.6B
engine load so it not only logs the error (logger.error(...)) but also stops the
process (e.g., call sys.exit(1) or re-raise the exception) to prevent the server
from coming up in a non-functional "connected but silent" state; mirror the
failure behavior used in _load_engine to ensure a hard abort on load failure.
- Around line 228-235: The code unconditionally sets
attn_implementation="flash_attention_2" and bfloat16 dtypes when calling
Qwen3TTSForConditionalGeneration.from_pretrained, which will fail on CPU
fallback; change the call so it branches on the runtime device (self.device /
device_map) and only sets attn_implementation="flash_attention_2" and
torch.bfloat16/dtype=torch.bfloat16 when the target device is CUDA (GPU); for
CPU use attn_implementation=None (or omit the kwarg) and
torch_dtype/dtype=torch.float32 (or remove dtype args) to ensure CPU loads
succeed—update both occurrences around
Qwen3TTSForConditionalGeneration.from_pretrained and the similar call later so
the attn_implementation and dtype choices are conditional on the device.
- Line 428: The end-marker currently enqueues only b"__END__" and the sender
coroutine (which reads current_job_id) may bind response.done to the wrong job;
change the protocol to enqueue tuples including the job id (e.g., (job_id,
b"__END__") and for all audio chunks) so the consumer uses the job_id from the
queue instead of the shared current_job_id; update places that call
loop.call_soon_threadsafe(audio_queue.put_nowait, ...) and the sender coroutine
that reads from audio_queue to unpack (job_id, chunk) and only mark
response.done for that job_id (avoid relying on mutable current_job_id and
adjust handling in the code paths around audio_queue usage and response.done
accordingly).

In `@main_logic/tts_client.py`:
- Line 1667: The variable response_done_event declared at the top-level of the
tts client (response_done_event = asyncio.Event()) is unused and triggers lint
F841; remove that unused declaration from main_logic/tts_client.py (delete the
response_done_event = asyncio.Event() line) and any redundant imports if they
become unused as a result so no .set/.wait/.clear calls remain missing.
- Around line 1802-1803: The function local_qwen3_tts_worker currently ignores
the voice_id parameter and always uses a hardcoded voice_pt_path; update
local_qwen3_tts_worker to honor voice_id by either mapping voice_id to the
correct voice_pt_path (e.g., via a local dictionary or lookup function) or by
including/forwarding voice_id in the request payload sent to the local TTS
service instead of the hardcoded path; make the change where voice_pt_path is
set and where the request payload/body is assembled so callers can switch voices
dynamically (referencing local_qwen3_tts_worker and the voice_pt_path usage).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08fd718 and 68f74ad.

📒 Files selected for processing (2)
  • local_server/qwen3_tts_server/local_server.py
  • main_logic/tts_client.py

Comment thread local_server/qwen3_tts_server/local_server.py Outdated
Comment on lines +634 to +684
msg_type = data.get("type")
if "text" in data and not msg_type: msg_type = "legacy.text"

if msg_type == "input_text_buffer.append":
text_fragment = data.get("text", "")
sentence_buffer += text_fragment

elif msg_type in ("input_text_buffer.commit", "legacy.text"):
if msg_type == "legacy.text":
sentence_buffer += data.get("text", "")

_, current_version = self._ensure_active_prompt_loaded()
if current_version != last_voice_version:
last_voice_version = current_version
# Hot voice switch: cancel current audio job but keep the buffered text.
# Otherwise the first committed sentence right after switching may be dropped.
await _stop_current_job(keep_buffer=True)

# 正则智能断句
parts = re.split(r'([。!?.!?\n]+)', sentence_buffer)

if len(parts) > 1:
for i in range(0, len(parts) - 1, 2):
sentence = parts[i] + parts[i + 1]
sentence = sentence.strip()
if not sentence: continue

if not current_job_id:
current_job_id = str(uuid.uuid4())
await websocket.send(json.dumps({"type": "response.start", "job_id": current_job_id}))

logger.info(f"📥 句子: {sentence[:20]}...")
prompt, _ = self._ensure_active_prompt_loaded()
self.task_queue.put((sentence, current_job_id, loop, audio_queue, cancel_event, prompt, self.language))

sentence_buffer = parts[-1]

# 缓冲区兜底 (防止一直不说话)
if len(sentence_buffer) > self.buffer_fallback_chars:
sentence = sentence_buffer
sentence_buffer = ""
if not current_job_id:
current_job_id = str(uuid.uuid4())
await websocket.send(json.dumps({"type": "response.start", "job_id": current_job_id}))
prompt, _ = self._ensure_active_prompt_loaded()
self.task_queue.put((sentence, current_job_id, loop, audio_queue, cancel_event, prompt, self.language))

elif msg_type == "cancel":
await _stop_current_job()

finally:
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

服务端未处理 session.update,客户端传的 voice_pt_path/language 目前不生效喵

当前 handle_tts 只处理 append/commit/cancel/legacy;而客户端会在连接后发送 session.update。这会让音色路径与语言参数“看起来传了但实际上没用”喵。建议补一个 msg_type == "session.update" 分支,更新运行参数并触发 prompt 热切换喵。

🧰 Tools
🪛 Ruff (0.15.2)

[error] 635-635: Multiple statements on one line (colon)

(E701)


[warning] 653-653: String contains ambiguous (FULLWIDTH EXCLAMATION MARK). Did you mean ! (EXCLAMATION MARK)?

(RUF001)


[warning] 653-653: String contains ambiguous (FULLWIDTH QUESTION MARK). Did you mean ? (QUESTION MARK)?

(RUF001)


[error] 659-659: Multiple statements on one line (colon)

(E701)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@local_server/qwen3_tts_server/local_server.py` around lines 634 - 684, Add a
branch for msg_type == "session.update" inside handle_tts that reads
voice_pt_path and language from data and writes them to the running session
(e.g., self.voice_pt_path = data.get("voice_pt_path") and self.language =
data.get("language")), then force a prompt/version refresh by calling
self._ensure_active_prompt_loaded() and, if the returned current_version !=
last_voice_version, set last_voice_version = current_version and await
_stop_current_job(keep_buffer=True) to trigger the hot voice switch; also
consider sending a small response back over websocket (optional) to acknowledge
the update.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug解除了吗

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wehos 很遗憾,这个 bug 还没解除喵 😿

检查了当前代码,handle_tts 方法里只处理了 input_text_buffer.appendinput_text_buffer.commitlegacy.textcancel 这几种消息类型,但没有处理 session.update 喵。

这意味着客户端发送的 voice_pt_pathlanguage 参数目前不会生效喵。

建议修复方案:
handle_tts 的消息处理分支里补充一个 session.update 处理器:

elif msg_type == "session.update":
    # 更新运行参数
    if "voice_pt_path" in data:
        self.voice_pt_path = data["voice_pt_path"]
    if "language" in data:
        self.language = data["language"]
    
    # 触发音色热切换
    _, current_version = self._ensure_active_prompt_loaded()
    if current_version != last_voice_version:
        last_voice_version = current_version
        await _stop_current_job(keep_buffer=True)
    
    # 可选:发送确认消息
    await websocket.send(json.dumps({"type": "session.updated"}))

这样就能让客户端动态更新音色和语言参数啦喵~ ✨

∧_∧
(´・ω・`) 需要我帮忙生成修复代码吗喵?
| ⊃/(___
|  /
 ̄ ̄ ̄


🧠 Learnings used
Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 330
File: config/prompts_sys.py:684-684
Timestamp: 2026-02-20T09:45:24.884Z
Learning: 在 config/prompts_sys.py 中,proactive_screen_web_en/ja/ko 和 proactive_generate_ja 等 prompt 的章节结束分隔符刻意使用中文"======以上为对话历史======"而不是目标语言,这是安全水印设计,不应修改。这种开始标签用目标语言、结束标签用中文的混合模式是有意为之的。

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 172
File: main_routers/characters_router.py:73-133
Timestamp: 2025-12-12T04:48:39.250Z
Learning: Review server binding and port exposure across the repository. Ensure Python servers bind to localhost/127.0.0.1 by default and that container/docker-compose port mappings do not expose sensitive services to the LAN unless explicitly intended. For docker-compose, verify that ports are not unnecessarily published to the host network; prefer binding to localhost or using internal networks, and document the exposure rationale. Specifically check main_server/config on bindings and the docker-compose.yml port mappings to confirm exposure is intentional and secure.

Learnt from: wehos
Repo: Project-N-E-K-O/N.E.K.O PR: 186
File: main_routers/workshop_router.py:1707-1721
Timestamp: 2025-12-16T12:04:33.158Z
Learning: In Python code, do not review or enforce changes between logger.error and logger.exception; choosing between these logging methods is a developer preference and should not be a code-review constraint. Treat this as a stylistic choice and focus reviews on functional correctness and other logging concerns as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants