I've run several tests with various speakers, and I'm encountering a problem in almost all of them: the speakers seem to blend together or aren't detected correctly. For example, if there are two people in the video, sometimes only one is detected, or if two (or more) are detected, sometimes person A speaks like person B (or vice versa).
I've run several tests with various speakers, and I'm encountering a problem in almost all of them: the speakers seem to blend together or aren't detected correctly. For example, if there are two people in the video, sometimes only one is detected, or if two (or more) are detected, sometimes person A speaks like person B (or vice versa).