Skip to content

[Bugfix] Fix padding bug in 12Hz tokenizer ConvTranspose1d decode#1241

Open
linyueqian wants to merge 1 commit intovllm-project:mainfrom
linyueqian:fix/qwen3-tts-12hz-padding-bug
Open

[Bugfix] Fix padding bug in 12Hz tokenizer ConvTranspose1d decode#1241
linyueqian wants to merge 1 commit intovllm-project:mainfrom
linyueqian:fix/qwen3-tts-12hz-padding-bug

Conversation

@linyueqian
Copy link
Contributor

@linyueqian linyueqian commented Feb 6, 2026

Purpose

Fix padding bugs in the 12Hz speech tokenizer decoder (Qwen3TTSTokenizerV2CausalTransConvNet), syncing with the upstream fix from QwenLM/Qwen3-TTS@5f8581d.

Two bugs in __init__ and forward:

  1. Chained assignment bug: self.right_pad = pad = self.left_pad is a chained assignment that sets both pad and self.right_pad to self.left_pad, making them always equal. The intended behavior is left_pad = 0 and right_pad = kernel_size - stride.
  2. Slicing bug: The old forward trimmed from both left and right unconditionally. Additionally, if right_pad is 0, hidden_state[..., :-0] returns an empty tensor (Python gotcha: x[:-0] == x[:0]). The fix only trims from the right and guards against right_pad == 0.

Test Plan

python examples/offline_inference/qwen3_tts/end2end.py --query-type CustomVoice

Test Result

Offline inference with Qwen3-TTS-12Hz-1.7B-CustomVoice completed successfully, producing valid audio output (shape=(247680,), sr=24000, ~10.3s).
output_0_94945aa2-880b-4966-8a44-a52f3147e11d.wav


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

Signed-off-by: linyueqian <linyueqian@outlook.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes two critical bugs in the Qwen3-TTS 12Hz tokenizer's ConvTranspose1d decoder (Qwen3TTSTokenizerV2CausalTransConvNet), syncing with an upstream fix from the official QwenLM/Qwen3-TTS repository.

Changes:

  • Fixed chained assignment bug that incorrectly set left_pad and right_pad to the same value
  • Fixed slicing bug that trimmed from both sides and caused empty tensor returns when right_pad == 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants