That's a known issue I'm still working to understand.
Sometimes, especially when the vocals stem contains back-vocals and/or chorus, the alignment slides off the rails, sometimes it thinks the word is longer than it actually is, and in general, it is pretty non-deterministic.
I would appreciate any support from the experienced with the alignment models devs to figure out this issue
That's a known issue I'm still working to understand.
Sometimes, especially when the vocals stem contains back-vocals and/or chorus, the alignment slides off the rails, sometimes it thinks the word is longer than it actually is, and in general, it is pretty non-deterministic.
I would appreciate any support from the experienced with the alignment models devs to figure out this issue