-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Correct batched transcription word alignment, option propagation, and typing issues #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This change ensures add_word_timestamps consistently returns a float as advertised by its type annotation. Previously, an empty input returned None, which could leak into callers and cause subtle failures. Returning the prior last_speech_timestamp preserves expected behavior (“no update”) in this edge case.
This change prevents in-place mutation of a user-supplied vad_parameters dictionary. Previously, removing max_speech_duration_s modified the caller’s dict unexpectedly, which can cause confusing downstream behavior if the dict is reused. Copying first preserves caller expectations while maintaining the same internal logic.
…stead of hardcoding The batched transcription API previously accepted several parameters but silently ignored them via hardcoded values. This change wires those parameters through to TranscriptionOptions so callers get the behavior they requested.
In the batched pipeline, VAD options may legitimately be absent when VAD is disabled or when timestamps are provided directly. This change updates the type of vad_options to Optional to reflect actual runtime behavior. It improves correctness for type-checkers and avoids implying a value is always present.
…loops) This adjusts progress reporting so the batched progress bar advances based on chunks processed rather than nested result iteration. The prior placement could drift if output shape ever changes (e.g., filtering, empty results), leading to misleading progress. Updating once per processed batch keeps progress accurate and easier to reason about.
This is both a typing fix and an implementation cleanup so you don’t return a tuple while hinting a list (and so you accept what callers actually pass). This makes get_suppressed_tokens consistent with how it’s actually called: callers pass lists (or None), but the function was annotated for tuples and returned a tuple while claiming to return a list. Returning a sorted List[int] matches typical generation API expectations and removes internal contradictions (e.g., asserting the input is a list despite annotating it otherwise). The behavior remains the same—only the typing and return shape are made consistent and predictable.
This corrects the return type annotation for _split_segments_by_timestamps to match its actual return value. The function returns (segments, seek, single_timestamp_ending), not a list of lists. Aligning typing with behavior improves IDE/type-checker accuracy without changing runtime logic.
|
Please spare us from the AI-generated nonsense. 😁 You should understand what you are doing instead of letting AI do random changes/PRs. |
I do understand somewhat, perhaps not as much as you because this isn't my profession. Anything actually wrong with the improvements? |
|
AI as a drafting aid is fine, but blindly letting AI code for you when you understand it only "somewhat" isn't. |
|
I've always receive negativity from you for over a year so I don't expect that to change, and I'm not interested in arguing again. If you have anything constructive to say about the actual merits of the code itself - i.e. whether it's good or bad - I'm all ears though. Cheers. |
In some worlds it's called sincerity. I don't remember arguing, I just deleted your angry tantrums and blocked you for few months.
I clicked on some commit and saw AI-generated nonsense - I've already said what I think about it. Just think for a moment, if someone wants to entertain themselves by analyzing AI-generated outputs, they can do it in a few clicks without your help. If you still think it's OK to do such PRs, read there: https://meta.stackoverflow.com/questions/421831/policy-generative-ai-e-g-chatgpt-is-banned |
You're extremely immature and argumentative for no reason. If you have anything substantive to say about my actual PR that's the only thing I'll respond to moving forward. I don't care if you have some kind of grudge against me and forever want to be rude and disrespectful. Thanks. |
|
@MahmoudAshraf97 Are you one of the maintainers? @Purfview has essentially been harassing me for well over a year now on this forum as well as others. Can you please request that he stop? I'd prefer to continue contributing to the faster-whisper repository because it's near and dear to my heart intellectually, but if anytime I want to try and contribute I'm going to be berated and insulted, have some kind of grudge he has against me brought from a year+ ago, etc...It's extremely unprofessional. I'd prefer to have my hard work respected, my pull requests judged for their substance and not some kind of philosophical purest position or what @Purfview believes my education level should be regarding Python programming. It's very condescending and disrespectful. If there are problems with the actual substance of my PRs that I understand, but I'd like to ask you as one of the maintainers to help put a stop to this harassment. Thanks! |
See the notes for each commit, but this should improve things a little.
[UPDATE]: I'm noticing that the run fails because of black but I ran black and it didn't detect anything...
