Skip to content

Support timestamps for AED (Canary) models #115

@rcspam

Description

@rcspam

Context

I'm building an audio context buffer for speech recognition: previous audio is prepended to the current recording, sent to the ASR model, and word-level timestamps are used to split the result and keep only the new text.

This works well with NemoConformerTdt (Parakeet) since _decoding() returns token indices that are converted to timestamps via window_step * subsampling_factor.

Problem

NemoConformerAED (Canary 1B v2) always returns None for timestamps:

# nemo.py line 228
def _decoding(self, ...) -> Iterator[tuple[Iterable[int], None, Iterable[float]]]:

The _transcribe_input includes <|notimestamp|> (line 169), but the Canary model vocabulary also contains a <|timestamp|> token, suggesting the model itself supports timestamps.

Request

Would it be possible to:

  1. Replace <|notimestamp|> with <|timestamp|> when timestamps are requested (e.g., via with_timestamps())
  2. Parse timestamp tokens from the AED decoder output and return them as indices in _decoding()

This would enable the TimestampedResultsAsrAdapter to work with Canary models, similar to how it works with TDT models.

Environment

  • onnx-asr: 0.10.3.dev50
  • Model: nemo-canary-1b-v2
  • model.with_timestamps().recognize() returns TimestampedResult(text=..., timestamps=None, tokens=[...])

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions