fix: resolve tensor shape mismatch and index out-of-bounds in CausalM… by Caxson · Pull Request #1848 · FunAudioLLM/CosyVoice

Caxson · 2026-03-12T06:34:06Z

…askedDiffWithDiT training

Add explicit length alignment between speech_feat and expanded token embeddings h by truncating both to min_len, fixing a training crash caused by subtle frame-count differences between parquet-cached speech_token and pipeline-extracted speech_feat (e.g. 960-frame padding misalignment, see issue 修复cosyvoice2的flow合成前后爆音 #1051)
Replace random.randint(0, int(0.3 * j)) with min(int(0.3 * j), min_len) for CFG condition index to prevent out-of-bounds access after truncation

…askedDiffWithDiT training - Add explicit length alignment between speech_feat and expanded token embeddings h by truncating both to min_len, fixing a training crash caused by subtle frame-count differences between parquet-cached speech_token and pipeline-extracted speech_feat (e.g. 960-frame padding misalignment, see issue FunAudioLLM#1051) - Replace random.randint(0, int(0.3 * j)) with min(int(0.3 * j), min_len) for CFG condition index to prevent out-of-bounds access after truncation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve tensor shape mismatch and index out-of-bounds in CausalM…#1848

fix: resolve tensor shape mismatch and index out-of-bounds in CausalM…#1848
Caxson wants to merge 1 commit intoFunAudioLLM:mainfrom
Caxson:fix_flow_bug

Caxson commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Caxson commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant