Add Cohere Transcribe support; bump sherpa-onnx to v1.12.38 (0.7.0)#131
Open
1R053 wants to merge 1 commit into
Open
Add Cohere Transcribe support; bump sherpa-onnx to v1.12.38 (0.7.0)#1311R053 wants to merge 1 commit into
1R053 wants to merge 1 commit into
Conversation
Wraps the SherpaOnnxOfflineCohereTranscribeModelConfig C API added
upstream in sherpa-onnx v1.12.38 via a new cohere_transcribe module
(CohereTranscribeRecognizer / CohereTranscribeConfig). 14-language
ASR with native punctuation and inverse-text-normalization toggles.
Upgrades the bundled sherpa-onnx from v1.12.15 to v1.12.38 (submodule,
dist.json tag, checksum.txt). The v1.12.38 C API has new fields on
several existing structs (cohere_transcribe, fire_red_asr_ctc,
funasr_nano, merged_decoder, enable_*_timestamps, pocket, supertonic);
switched the affected module constructors to ..Default::default() so
future field additions are no-break.
ZipVoiceTtsConfig: removes the now-absent flow_matching_model /
text_model / pinyin_dict fields and adds the new encoder / decoder /
lexicon fields to match the upstream layout. Breaking change -> 0.7.0.
Apple Silicon performance fix: dist.json now pulls
sherpa-onnx-{tag}-onnxruntime-1.24.4-osx-arm64-shared.tar.bz2 (the
full-optimization 35 MB onnxruntime build, same binary Python pip
ships) instead of sherpa-onnx-{tag}-osx-arm64-shared.tar.bz2 (a
smaller variant missing graph-optimization paths that blocked the
post-first-inference kernel-cache warmup). Net effect on spot-check
with Cohere Transcribe int8: multi-x speedup on warm inferences,
bringing warm-path parity with the Python pip wheel. Also split the
universal2 fat binary entries into native single-arch tarballs for
aarch64/x86_64 Darwin, and renamed the Windows asset to the new
MD-Release naming upstream adopted.
First cargo-runnable tests in the repo (previously only examples/).
tests/offline_recognizers.rs exercises both the updated Whisper path
(regression guard for the v1.12.38 bump) and the new Cohere module
end-to-end against real audio. tests/test_utils.rs is a reusable
helper exposing ensure_model(&ModelArchive) / ensure_motivation_wav()
- resolves a cache dir from SHERPA_TEST_MODELS or workspace-root
test_data/, skips gracefully when files are missing (CI-friendly
default), and auto-downloads k2-fsa/sherpa-onnx release assets when
SHERPA_DOWNLOAD_MODELS=1. Downloads serialised via std::sync::Once
for parallel-runner safety. Adding further model archives is a one-
const entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cohere_transcribemodule wrapping theSherpaOnnxOfflineCohereTranscribeModelConfigC API added upstream in sherpa-onnx v1.12.x — 14-language ASR with native punctuation + ITN toggles.dist.json,checksum.txt).examples/), with optional auto-download of models.Breaking changes — 0.7.0
ZipVoiceTtsConfigdropsflow_matching_model/text_model/pinyin_dict(removed upstream) and addsencoder/decoder/lexicon(added upstream) to match the v1.12.38 C struct layout. Callers must rename field usage.Apple Silicon performance fix
dist.jsonforaarch64-apple-darwinnow pullssherpa-onnx-{tag}-onnxruntime-1.24.4-osx-arm64-shared.tar.bz2(the full-optimization 35 MB onnxruntime build, bit-identical to the Python pip wheel's) instead ofsherpa-onnx-{tag}-osx-arm64-shared.tar.bz2(a smaller variant missing graph-optimization paths that blocked the post-first-inference kernel-cache warmup).Net effect on spot-check: multi-x speedup on warm inferences, bringing warm-path parity with the Python pip wheel. Concretely, we saw Cohere Transcribe int8 warm-run RTF drop from ~0.13 to ~0.05 on Apple Silicon.
Also:
*-MD-Release.tar.bz2upstream adopted.Test suite
tests/offline_recognizers.rs+tests/test_utils.rs:ensure_model(&ModelArchive)/ensure_motivation_wav()helper resolves a cache dir fromSHERPA_TEST_MODELSor workspace-roottest_data/.SHERPA_DOWNLOAD_MODELS=1.std::sync::Oncefor parallel-runner safety.constentry.Safe-field-addition refactor
Several recognizer constructors (moonshine, whisper, zipformer, transducer, sense_voice, paraformer, dolphin, tts/vits, tts/kokoro, tts/matcha, tts/kitten) switched their
SherpaOnnxOfflineModelConfig/SherpaOnnxOfflineTtsModelConfigliterals to..Default::default()— covers the new upstream fields (cohere_transcribe,fire_red_asr_ctc,funasr_nano,merged_decoder,enable_segment_timestamps,enable_token_timestamps,pocket,supertonic) without field-by-field churn and keeps future additions no-break.Test plan
cargo test -p sherpa-rs --test offline_recognizers— skips cleanly without models (6 pass, no failures)SHERPA_DOWNLOAD_MODELS=1 cargo test -p sherpa-rs --test offline_recognizers --release -- --nocapture— downloads and runs whisper + cohere, both produce correct English transcripts (6/6 pass)sherpa_onnxpip wheel on Apple Silicon after theonnxruntime-1.24.4-osx-arm64-sharedswitch