Skip to content

Add Cohere Transcribe support; bump sherpa-onnx to v1.12.38 (0.7.0)#131

Open
1R053 wants to merge 1 commit into
thewh1teagle:mainfrom
1R053:feat/cohere-transcribe-v1.12.38
Open

Add Cohere Transcribe support; bump sherpa-onnx to v1.12.38 (0.7.0)#131
1R053 wants to merge 1 commit into
thewh1teagle:mainfrom
1R053:feat/cohere-transcribe-v1.12.38

Conversation

@1R053
Copy link
Copy Markdown

@1R053 1R053 commented Apr 16, 2026

Summary

  • New cohere_transcribe module wrapping the SherpaOnnxOfflineCohereTranscribeModelConfig C API added upstream in sherpa-onnx v1.12.x — 14-language ASR with native punctuation + ITN toggles.
  • Bundled sherpa-onnx bumped v1.12.15 → v1.12.38 (submodule, dist.json, checksum.txt).
  • First cargo-runnable integration tests in the repo (previously only examples/), with optional auto-download of models.

Breaking changes — 0.7.0

  • ZipVoiceTtsConfig drops flow_matching_model / text_model / pinyin_dict (removed upstream) and adds encoder / decoder / lexicon (added upstream) to match the v1.12.38 C struct layout. Callers must rename field usage.
  • Minimum bundled sherpa-onnx is v1.12.38.

Apple Silicon performance fix

dist.json for aarch64-apple-darwin now pulls sherpa-onnx-{tag}-onnxruntime-1.24.4-osx-arm64-shared.tar.bz2 (the full-optimization 35 MB onnxruntime build, bit-identical to the Python pip wheel's) instead of sherpa-onnx-{tag}-osx-arm64-shared.tar.bz2 (a smaller variant missing graph-optimization paths that blocked the post-first-inference kernel-cache warmup).

Net effect on spot-check: multi-x speedup on warm inferences, bringing warm-path parity with the Python pip wheel. Concretely, we saw Cohere Transcribe int8 warm-run RTF drop from ~0.13 to ~0.05 on Apple Silicon.

Also:

  • Split the universal2 fat-binary entries into native single-arch tarballs for aarch64/x86_64 Darwin (smaller downloads, no lipo-imposed compiler compromises).
  • Renamed the Windows asset to the new *-MD-Release.tar.bz2 upstream adopted.

Test suite

tests/offline_recognizers.rs + tests/test_utils.rs:

  • Exercises both the updated Whisper path (regression guard for the v1.12.38 bump) and the new Cohere module end-to-end against real audio.
  • ensure_model(&ModelArchive) / ensure_motivation_wav() helper resolves a cache dir from SHERPA_TEST_MODELS or workspace-root test_data/.
  • Skips gracefully when files missing (CI-friendly default); auto-downloads when SHERPA_DOWNLOAD_MODELS=1.
  • Downloads serialised via std::sync::Once for parallel-runner safety.
  • Adding further model archives is a one-const entry.

Safe-field-addition refactor

Several recognizer constructors (moonshine, whisper, zipformer, transducer, sense_voice, paraformer, dolphin, tts/vits, tts/kokoro, tts/matcha, tts/kitten) switched their SherpaOnnxOfflineModelConfig / SherpaOnnxOfflineTtsModelConfig literals to ..Default::default() — covers the new upstream fields (cohere_transcribe, fire_red_asr_ctc, funasr_nano, merged_decoder, enable_segment_timestamps, enable_token_timestamps, pocket, supertonic) without field-by-field churn and keeps future additions no-break.

Test plan

  • cargo test -p sherpa-rs --test offline_recognizers — skips cleanly without models (6 pass, no failures)
  • SHERPA_DOWNLOAD_MODELS=1 cargo test -p sherpa-rs --test offline_recognizers --release -- --nocapture — downloads and runs whisper + cohere, both produce correct English transcripts (6/6 pass)
  • Warm-run RTF matches Python sherpa_onnx pip wheel on Apple Silicon after the onnxruntime-1.24.4-osx-arm64-shared switch
  • Non-macOS CI — Linux/Windows assets unchanged in mechanism, just renamed where upstream renamed them; needs validation by maintainers with those targets

Wraps the SherpaOnnxOfflineCohereTranscribeModelConfig C API added
upstream in sherpa-onnx v1.12.38 via a new cohere_transcribe module
(CohereTranscribeRecognizer / CohereTranscribeConfig). 14-language
ASR with native punctuation and inverse-text-normalization toggles.

Upgrades the bundled sherpa-onnx from v1.12.15 to v1.12.38 (submodule,
dist.json tag, checksum.txt). The v1.12.38 C API has new fields on
several existing structs (cohere_transcribe, fire_red_asr_ctc,
funasr_nano, merged_decoder, enable_*_timestamps, pocket, supertonic);
switched the affected module constructors to ..Default::default() so
future field additions are no-break.

ZipVoiceTtsConfig: removes the now-absent flow_matching_model /
text_model / pinyin_dict fields and adds the new encoder / decoder /
lexicon fields to match the upstream layout. Breaking change -> 0.7.0.

Apple Silicon performance fix: dist.json now pulls
sherpa-onnx-{tag}-onnxruntime-1.24.4-osx-arm64-shared.tar.bz2 (the
full-optimization 35 MB onnxruntime build, same binary Python pip
ships) instead of sherpa-onnx-{tag}-osx-arm64-shared.tar.bz2 (a
smaller variant missing graph-optimization paths that blocked the
post-first-inference kernel-cache warmup). Net effect on spot-check
with Cohere Transcribe int8: multi-x speedup on warm inferences,
bringing warm-path parity with the Python pip wheel. Also split the
universal2 fat binary entries into native single-arch tarballs for
aarch64/x86_64 Darwin, and renamed the Windows asset to the new
MD-Release naming upstream adopted.

First cargo-runnable tests in the repo (previously only examples/).
tests/offline_recognizers.rs exercises both the updated Whisper path
(regression guard for the v1.12.38 bump) and the new Cohere module
end-to-end against real audio. tests/test_utils.rs is a reusable
helper exposing ensure_model(&ModelArchive) / ensure_motivation_wav()
- resolves a cache dir from SHERPA_TEST_MODELS or workspace-root
test_data/, skips gracefully when files are missing (CI-friendly
default), and auto-downloads k2-fsa/sherpa-onnx release assets when
SHERPA_DOWNLOAD_MODELS=1. Downloads serialised via std::sync::Once
for parallel-runner safety. Adding further model archives is a one-
const entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant