Conversation
….tune. Fix major bug in Optimizer
|
Hey, I am unable to use this: |
|
@C0RE1312 sounds like a problem with pytorch not being able to compute the FFT. Have you tried updating the dependenciesof both torch and whisper? it's a pretty old PR |
Will this work with faster-whisper or any other faster version of whisper? |
BTW, I noticed that the last commit was in the April of 2023. So this feature has no new commits for more than one year. Do this mean the feature implementation has finished but it was not merged into the main branch? I noticed in the readme page of this project, there was a note stating that this feature was comming soon but ready. |
|
@ywangwxd unfortunately I haven't had the time to work on this as I'd like. I prioritized other things like documentation and testing for #98 |
|
Nice job!!! can't wait to see the next update. |
Depends on #144
This PR adds a new
SpeakerAwareTranscriptionpipeline that combines streaming diarization and streaming transcription to determine "who says what" in a live conversation. By default, this is shown as colored words in the terminal.The feature works as expected with
diart.streamanddiart.serve/diart.client.The main thing preventing full compatibility with
diart.benchmarkanddiart.tuneis the evaluation metric.Since the output of the pipeline is annotated text with the format:
[speaker0]Hello [speaker1]Hi, the metricdiart.metrics.WordErrorRatewill count labels as insertion errors.Next steps: implement a
SpeakerWordErrorRatethat computes the (weighted?) average WER across speakers.Changelog
TBD