Context
I'm building an audio context buffer for speech recognition: previous audio is prepended to the current recording, sent to the ASR model, and word-level timestamps are used to split the result and keep only the new text.
This works well with NemoConformerTdt (Parakeet) since _decoding() returns token indices that are converted to timestamps via window_step * subsampling_factor.
Problem
NemoConformerAED (Canary 1B v2) always returns None for timestamps:
# nemo.py line 228
def _decoding(self, ...) -> Iterator[tuple[Iterable[int], None, Iterable[float]]]:
The _transcribe_input includes <|notimestamp|> (line 169), but the Canary model vocabulary also contains a <|timestamp|> token, suggesting the model itself supports timestamps.
Request
Would it be possible to:
- Replace
<|notimestamp|> with <|timestamp|> when timestamps are requested (e.g., via with_timestamps())
- Parse timestamp tokens from the AED decoder output and return them as indices in
_decoding()
This would enable the TimestampedResultsAsrAdapter to work with Canary models, similar to how it works with TDT models.
Environment
- onnx-asr: 0.10.3.dev50
- Model: nemo-canary-1b-v2
model.with_timestamps().recognize() returns TimestampedResult(text=..., timestamps=None, tokens=[...])
Context
I'm building an audio context buffer for speech recognition: previous audio is prepended to the current recording, sent to the ASR model, and word-level timestamps are used to split the result and keep only the new text.
This works well with NemoConformerTdt (Parakeet) since
_decoding()returns token indices that are converted to timestamps viawindow_step * subsampling_factor.Problem
NemoConformerAED (Canary 1B v2) always returns
Nonefor timestamps:The
_transcribe_inputincludes<|notimestamp|>(line 169), but the Canary model vocabulary also contains a<|timestamp|>token, suggesting the model itself supports timestamps.Request
Would it be possible to:
<|notimestamp|>with<|timestamp|>when timestamps are requested (e.g., viawith_timestamps())_decoding()This would enable the
TimestampedResultsAsrAdapterto work with Canary models, similar to how it works with TDT models.Environment
model.with_timestamps().recognize()returnsTimestampedResult(text=..., timestamps=None, tokens=[...])