Skip to content

fix: resolve tensor shape mismatch and index out-of-bounds in CausalM…#1848

Open
Caxson wants to merge 1 commit intoFunAudioLLM:mainfrom
Caxson:fix_flow_bug
Open

fix: resolve tensor shape mismatch and index out-of-bounds in CausalM…#1848
Caxson wants to merge 1 commit intoFunAudioLLM:mainfrom
Caxson:fix_flow_bug

Conversation

@Caxson
Copy link

@Caxson Caxson commented Mar 12, 2026

…askedDiffWithDiT training

  • Add explicit length alignment between speech_feat and expanded token embeddings h by truncating both to min_len, fixing a training crash caused by subtle frame-count differences between parquet-cached speech_token and pipeline-extracted speech_feat (e.g. 960-frame padding misalignment, see issue 修复cosyvoice2的flow合成前后爆音 #1051)
  • Replace random.randint(0, int(0.3 * j)) with min(int(0.3 * j), min_len) for CFG condition index to prevent out-of-bounds access after truncation

…askedDiffWithDiT training

- Add explicit length alignment between speech_feat and expanded token
  embeddings h by truncating both to min_len, fixing a training crash
  caused by subtle frame-count differences between parquet-cached
  speech_token and pipeline-extracted speech_feat (e.g. 960-frame
  padding misalignment, see issue FunAudioLLM#1051)
- Replace random.randint(0, int(0.3 * j)) with min(int(0.3 * j), min_len)
  for CFG condition index to prevent out-of-bounds access after truncation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant