Conversation
…custom vocab log probs API - Resolved conflicts in offline-stream.cc to support both ys_log_probs (upstream) and token_log_probs (custom) - Updated pubspec.yaml to use local path for development - Preserved custom SherpaOnnxVocabLogProbs API for full vocabulary log probabilities - Maintained compatibility with upstream's ys_log_probs field
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds token- and vocabulary-level log-probability support across the codebase: new C API struct/functions for vocab log-probs, Dart FFI bindings and stream methods to fetch them, result types extended with token/ys/vocab probs, decoders updated to compute log-softmax distributions, and JSON/serialization wiring updated. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Dart App
participant DartStream as Dart Online/Offline Stream
participant Bindings as SherpaOnnxBindings
participant Native as Native Lib (C/C++)
participant Heap as Native Heap
App->>DartStream: getVocabLogProbs()
DartStream->>Bindings: check get*VocabLogProbs ptr
Bindings-->>DartStream: function ptr
DartStream->>Native: call get*VocabLogProbs(stream)
activate Native
Native->>Heap: allocate SherpaOnnxVocabLogProbs + flattened array
Native-->>DartStream: Pointer<SherpaOnnxVocabLogProbs>
deactivate Native
DartStream->>DartStream: read numTokens, vocabSize, flatten -> Map<String,List<double>>
DartStream->>Bindings: destroyVocabLogProbs(ptr)
Bindings->>Native: free array + struct
Native-->>Bindings: done
DartStream-->>App: return Map or null
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Dokotela, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement by exposing the full vocabulary log probabilities for each decoded token position across both online and offline speech recognition streams. This new capability is crucial for advanced applications such as confidence scoring, uncertainty estimation, and hypothesis fusion algorithms, providing a deeper insight into model predictions beyond just the most likely token. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new API to access vocabulary log probabilities, which is a valuable feature for advanced use cases like model fusion and uncertainty estimation. The changes span across the C-API, core C++ logic, and Dart/Flutter bindings.
My review has identified a few areas for improvement:
- There are some inconsistencies in propagating the new
vocab_log_probsdata in different recognizer implementations. - A critical copy-paste error was found in one of the decoders.
- Some opportunities for code cleanup and performance optimization were also noted.
Overall, this is a great addition. Addressing the feedback will improve the robustness and consistency of the new API.
sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (9)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (2)
92-96: Consider removing the "ADD THIS" comment.The inline comment
// ADD THISappears to be a development artifact and should be removed before merging.- // Store full vocabulary distribution (already log-softmaxed) + // Store full vocabulary log-probability distribution std::vector<float> full_vocab_probs(vocab_size);
119-120: Remove development comment.The comment
// ADD THISon line 120 should be removed.ans.token_log_probs = std::move(token_log_probs); - ans.vocab_log_probs = std::move(vocab_log_probs); // ADD THIS + ans.vocab_log_probs = std::move(vocab_log_probs);sherpa-onnx/csrc/offline-moonshine-decoder.h (1)
17-19: Consider adding documentation forvocab_log_probs.The
token_log_probsfield has a descriptive comment. Consider adding a similar comment forvocab_log_probsto document its shape and semantics./// Token-level log probabilities (confidence scores) std::vector<float> token_log_probs; + /// Full vocabulary log-probability distribution per token [num_tokens][vocab_size] std::vector<std::vector<float>> vocab_log_probs;sherpa-onnx/c-api/c-api.h (1)
676-689: Add documentation for the new API functions.The struct definition has good inline documentation, but the three new functions lack documentation comments explaining:
- Return value semantics (caller owns the pointer, returns NULL if no vocab log probs available)
- Memory management (must call
SherpaOnnxDestroyVocabLogProbsto avoid memory leak)SHERPA_ONNX_API typedef struct SherpaOnnxVocabLogProbs { const float *log_probs; // Flattened 2D array [num_tokens][vocab_size] int32_t num_tokens; int32_t vocab_size; } SherpaOnnxVocabLogProbs; +/// Get vocabulary log probabilities for an online stream. +/// +/// @param stream A pointer returned by SherpaOnnxCreateOnlineStream(). +/// @return A pointer to vocab log probs, or NULL if not available. +/// The user must invoke SherpaOnnxDestroyVocabLogProbs() to free +/// the returned pointer to avoid memory leak. SHERPA_ONNX_API const SherpaOnnxVocabLogProbs * SherpaOnnxOnlineStreamGetVocabLogProbs(const SherpaOnnxOnlineStream *stream); +/// Get vocabulary log probabilities for an offline stream. +/// +/// @param stream A pointer returned by SherpaOnnxCreateOfflineStream(). +/// @return A pointer to vocab log probs, or NULL if not available. +/// The user must invoke SherpaOnnxDestroyVocabLogProbs() to free +/// the returned pointer to avoid memory leak. SHERPA_ONNX_API const SherpaOnnxVocabLogProbs * SherpaOnnxOfflineStreamGetVocabLogProbs(const SherpaOnnxOfflineStream *stream); +/// Free a pointer returned by SherpaOnnxOnlineStreamGetVocabLogProbs() or +/// SherpaOnnxOfflineStreamGetVocabLogProbs(). +/// +/// @param log_probs A pointer returned by the getter functions. SHERPA_ONNX_API void SherpaOnnxDestroyVocabLogProbs( const SherpaOnnxVocabLogProbs *log_probs);sherpa-onnx/csrc/offline-stream.cc (2)
445-473: Suggest consistent precision for log probability fields.The serialization uses different precisions for log probabilities:
ys_log_probs: precision 6 (line 454)token_log_probs: precision 4 (line 469)Since both represent log probabilities, consider using consistent precision (e.g., 6 for both) to maintain uniformity in the JSON output.
Apply this diff for consistency:
for (auto prob : token_log_probs) { - os << sep << std::fixed << std::setprecision(4) << prob; + os << sep << std::fixed << std::setprecision(6) << prob; sep = ", "; }
472-486: Trailing comma assumes words is always serialized.Line 472 adds a trailing comma after
token_log_probs, which is fine becausewordsis always serialized next (lines 477-486). However, if in the futurewordsserialization becomes conditional, this would produce invalid JSON whentoken_log_probsis present butwordsis empty.Consider this pattern if words ever becomes optional:
// Don't add trailing comma after token_log_probs os << "]"; // Later, add comma before words if token_log_probs was present if (!token_log_probs.empty() && !words.empty()) { os << ", "; }flutter/sherpa_onnx/lib/src/online_stream.dart (1)
40-73: Consider usingList<List<double>>instead ofMap<String, List<double>>for better performance and semantics.The current implementation has two concerns:
Data structure choice: Using a
Mapwith string keys ("token_0","token_1", etc.) for inherently sequential data is unusual. AList<List<double>>would be more natural, efficient, and consistent with the underlying C array structure.Performance: The nested loop with per-element access is O(numTokens × vocabSize). For large vocabularies, using
asTypedListfor bulk copy would be significantly faster.- Map<String, List<double>>? getVocabLogProbs() { + List<List<double>>? getVocabLogProbs() { final getFunc = SherpaOnnxBindings.getOnlineStreamVocabLogProbs; final destroyFunc = SherpaOnnxBindings.destroyVocabLogProbs; if (getFunc == null || destroyFunc == null) { return null; } final ptr = getFunc(this.ptr); if (ptr == nullptr) { return null; } final vocabLogProbs = ptr.ref; final numTokens = vocabLogProbs.numTokens; final vocabSize = vocabLogProbs.vocabSize; - final Map<String, List<double>> result = {}; - - for (int tokenIdx = 0; tokenIdx < numTokens; tokenIdx++) { - final List<double> tokenProbs = []; - - for (int vocabIdx = 0; vocabIdx < vocabSize; vocabIdx++) { - final index = tokenIdx * vocabSize + vocabIdx; - final logProb = vocabLogProbs.logProbs[index]; - tokenProbs.add(logProb); - } - - result['token_$tokenIdx'] = tokenProbs; - } + final List<List<double>> result = []; + final allProbs = vocabLogProbs.logProbs.asTypedList(numTokens * vocabSize); + + for (int tokenIdx = 0; tokenIdx < numTokens; tokenIdx++) { + final start = tokenIdx * vocabSize; + result.add(allProbs.sublist(start, start + vocabSize).toList()); + } destroyFunc(ptr); return result; }sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
156-168: Consider reusing CalculateLogProb logic to reduce duplication.The manual log-softmax computation here (lines 159-167) duplicates logic from the
CalculateLogProbhelper function. Consider extracting a shared helper to compute the full log-softmax distribution.For example:
// Add a helper function to compute full log-softmax static void ComputeLogSoftmax(const float *logits, int32_t vocab_size, std::vector<float> &out) { out.resize(vocab_size); float max_logit = *std::max_element(logits, logits + vocab_size); double sum_exp = 0.0; for (int32_t i = 0; i < vocab_size; ++i) { sum_exp += std::exp(logits[i] - max_logit); } float log_sum = max_logit + std::log(sum_exp); for (int32_t i = 0; i < vocab_size; ++i) { out[i] = logits[i] - log_sum; } } // Then in the loop: std::vector<float> full_vocab_probs; ComputeLogSoftmax(current_logits, vocab_size, full_vocab_probs); predicted_vocab_log_probs.push_back(std::move(full_vocab_probs));sherpa-onnx/csrc/offline-recognizer-canary-impl.h (1)
171-209: Consider optimizing to reduce redundant passes.The method makes multiple passes over the logits array:
- Lines 179: Find max_logit
- Lines 183-185: Compute sum_exp
- Lines 191-193: Fill full_distribution (if requested)
- Lines 200-206: Find max token
Consider combining passes for better cache locality and performance:
std::pair<int32_t, float> GetMaxTokenIdWithConfidence( Ort::Value *logits, std::vector<float> *full_distribution = nullptr) const { auto meta = model_->GetModelMetadata(); const float *p_logits = logits->GetTensorData<float>(); // Find max for numerical stability float max_logit = *std::max_element(p_logits, p_logits + meta.vocab_size); // Single pass: compute sum_exp and find max token float sum_exp = 0.0f; int32_t max_token_id = 0; float max_log_prob = -std::numeric_limits<float>::infinity(); for (int32_t i = 0; i < meta.vocab_size; ++i) { float exp_val = std::exp(p_logits[i] - max_logit); sum_exp += exp_val; float log_prob = p_logits[i] - max_logit; // will subtract log_sum later if (log_prob > max_log_prob) { max_log_prob = log_prob; max_token_id = i; } } float log_sum = max_logit + std::log(sum_exp); max_log_prob = max_log_prob + max_logit - log_sum; // correct the log_prob // Fill distribution if requested if (full_distribution != nullptr) { full_distribution->resize(meta.vocab_size); for (int32_t i = 0; i < meta.vocab_size; ++i) { (*full_distribution)[i] = p_logits[i] - log_sum; } } return {max_token_id, max_log_prob}; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
flutter/sherpa_onnx/lib/src/offline_recognizer.dart(6 hunks)flutter/sherpa_onnx/lib/src/offline_stream.dart(1 hunks)flutter/sherpa_onnx/lib/src/online_recognizer.dart(3 hunks)flutter/sherpa_onnx/lib/src/online_stream.dart(1 hunks)flutter/sherpa_onnx/lib/src/sherpa_onnx_bindings.dart(4 hunks)flutter/sherpa_onnx/pubspec.yaml(1 hunks)sherpa-onnx/c-api/c-api.cc(1 hunks)sherpa-onnx/c-api/c-api.h(1 hunks)sherpa-onnx/csrc/offline-ctc-decoder.h(1 hunks)sherpa-onnx/csrc/offline-moonshine-decoder.h(1 hunks)sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc(4 hunks)sherpa-onnx/csrc/offline-recognizer-canary-impl.h(9 hunks)sherpa-onnx/csrc/offline-recognizer-ctc-impl.h(2 hunks)sherpa-onnx/csrc/offline-recognizer-moonshine-impl.h(1 hunks)sherpa-onnx/csrc/offline-recognizer-whisper-impl.h(2 hunks)sherpa-onnx/csrc/offline-stream.cc(1 hunks)sherpa-onnx/csrc/offline-stream.h(1 hunks)sherpa-onnx/csrc/offline-whisper-decoder.h(1 hunks)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc(4 hunks)sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h(2 hunks)sherpa-onnx/csrc/online-recognizer.h(1 hunks)sherpa-onnx/csrc/online-transducer-decoder.cc(2 hunks)sherpa-onnx/csrc/online-transducer-decoder.h(1 hunks)sherpa-onnx/csrc/online-transducer-greedy-search-decoder.cc(2 hunks)sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc(3 hunks)sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.h(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
flutter/sherpa_onnx/pubspec.yamlflutter/sherpa_onnx/lib/src/sherpa_onnx_bindings.dart
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
flutter/sherpa_onnx/pubspec.yamlflutter/sherpa_onnx/lib/src/sherpa_onnx_bindings.dart
🧬 Code graph analysis (11)
sherpa-onnx/csrc/online-recognizer.h (1)
sherpa-onnx/csrc/math.h (1)
float(52-72)
sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc (2)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(158-158)
sherpa-onnx/csrc/offline-recognizer-moonshine-impl.h (1)
sherpa-onnx/csrc/offline-stream.cc (2)
r(257-257)r(257-257)
sherpa-onnx/c-api/c-api.h (1)
sherpa-onnx/c-api/c-api.cc (6)
SherpaOnnxOnlineStreamGetVocabLogProbs(757-780)SherpaOnnxOnlineStreamGetVocabLogProbs(757-758)SherpaOnnxOfflineStreamGetVocabLogProbs(782-806)SherpaOnnxOfflineStreamGetVocabLogProbs(782-783)SherpaOnnxDestroyVocabLogProbs(808-814)SherpaOnnxDestroyVocabLogProbs(808-809)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(158-158)
sherpa-onnx/csrc/offline-recognizer-ctc-impl.h (1)
sherpa-onnx/csrc/offline-stream.cc (2)
r(257-257)r(257-257)
sherpa-onnx/csrc/online-transducer-decoder.h (1)
sherpa-onnx/csrc/math.h (1)
float(52-72)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)
sherpa-onnx/csrc/online-transducer-greedy-search-decoder.cc (2)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(158-158)
sherpa-onnx/csrc/offline-stream.h (1)
sherpa-onnx/csrc/math.h (1)
float(52-72)
sherpa-onnx/csrc/offline-recognizer-canary-impl.h (4)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(158-158)sherpa-onnx/csrc/onnx-utils.cc (2)
View(188-227)View(188-188)sherpa-onnx/csrc/offline-stream.cc (2)
r(257-257)r(257-257)
🪛 Cppcheck (2.18.0)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc
[warning] 72-72: Invalid std
(invalidFunctionArg)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc
[warning] 44-44: Invalid std
(invalidFunctionArg)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: ubuntu-24.04 3.11
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: Release shared tts-ON
- GitHub Check: ubuntu-24.04 3.12
- GitHub Check: ubuntu-24.04 3.10
- GitHub Check: Release static tts-ON
- GitHub Check: Release shared tts-OFF
- GitHub Check: Debug static tts-ON
- GitHub Check: Release shared-OFF tts-OFF
- GitHub Check: Release shared-OFF tts-ON
- GitHub Check: ubuntu-24.04 3.8
- GitHub Check: Debug shared tts-OFF
- GitHub Check: Debug shared-OFF tts-ON
- GitHub Check: rknn shared ON
- GitHub Check: ubuntu-24.04 3.13
- GitHub Check: Release shared-ON tts-OFF
- GitHub Check: Debug shared-ON tts-OFF
- GitHub Check: Debug shared-OFF tts-OFF
- GitHub Check: Debug shared-ON tts-ON
- GitHub Check: Release shared-ON tts-ON
🔇 Additional comments (28)
sherpa-onnx/csrc/online-transducer-decoder.cc (1)
43-43: LGTM!The copy and move assignment operators correctly propagate
vocab_log_probsfollowing the same pattern used forys_probs,lm_probs, andcontext_scores.Also applies to: 71-71
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
65-83: Log-softmax implementation is correct but has minor redundancy.The numerical stability approach (subtracting max_logit) is appropriate. However, the loop at lines 77-83 recomputes
p[j] - log_sumwhich was already computed forp[0]on line 75. Consider combining the max-finding with the log-prob computation in a single pass after computinglog_sum.The static analysis warning about
std::log(sum_exp)at line 72 is a false positive sincesum_expis always positive whenvocab_size > 0(asexp()returns positive values).sherpa-onnx/c-api/c-api.cc (2)
808-814: LGTM!The destructor correctly handles the null case and properly frees both the inner array (
log_probs) withdelete[]and the struct itself withdelete, matching the allocation pattern in the getter functions.
757-780: No robustness issue here. Thevocab_log_probsdata structure is populated by decoders where each inner vector is created with exactlyvocab_sizeelements from model output, which is a fixed property per model. Since all tokens use the same model with the same output shape, all inner vectors are guaranteed to have identical sizes by design. Thestd::copyoperation is safe—if any vector had a different size, it would indicate data corruption in the decoder itself, which would be a more fundamental problem than a buffer overflow in this code.sherpa-onnx/csrc/offline-recognizer-moonshine-impl.h (1)
30-30: LGTM!The assignment correctly propagates token_log_probs from the decoder result to the recognition result, consistent with how other fields are copied.
sherpa-onnx/csrc/online-recognizer.h (1)
45-45: LGTM!The vocab_log_probs field addition enables per-token vocabulary distribution access for online recognition. The 2D structure (tokens × vocab_size) is appropriate for storing full distributions.
sherpa-onnx/csrc/online-transducer-decoder.h (1)
33-36: LGTM!The vocab_log_probs field addition is well-documented with clear shape specification and usage conditions. The 2D vector structure appropriately stores per-token vocabulary distributions.
sherpa-onnx/csrc/offline-stream.h (1)
52-54: LGTM!The addition of token_log_probs (per-token confidence) and vocab_log_probs (full vocabulary distributions) to OfflineRecognitionResult enables comprehensive confidence scoring and uncertainty estimation as described in the PR objectives.
flutter/sherpa_onnx/lib/src/offline_stream.dart (1)
36-69: LGTM!The implementation correctly:
- Validates function pointers and returned data
- Calculates flattened array indices (line 59)
- Cleans up native resources (line 67)
- Handles null cases gracefully
The Map<String, List> structure with "token_i" keys provides clear token identification.
sherpa-onnx/csrc/offline-recognizer-ctc-impl.h (1)
34-62: LGTM!The implementation correctly maintains alignment between
r.tokensandr.token_log_probs. When tokens are filtered (e.g., SIL tokens at line 41), both the token and its corresponding log probability are skipped together, preserving the one-to-one correspondence.sherpa-onnx/csrc/offline-recognizer-whisper-impl.h (1)
157-172: The token/log_prob alignment is correctly preserved through filtering.The loop correctly maintains 1:1 correspondence between
r.tokensandr.token_log_probs: when a token is filtered by!sym_table.Contains(i), the loop continues without appending to either vector, so both are skipped symmetrically. The guard on line 169 is defensive but unnecessary given the decoder's guarantee of equal-sized source vectors. No alignment issue exists.sherpa-onnx/csrc/offline-whisper-decoder.h (1)
16-24: LGTM!The new
token_log_probsandvocab_log_probsfields are correctly typed and appropriately placed in the result struct to capture per-token and full-vocabulary log probability distributions.sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h (2)
52-59: LGTM!The
temperature_scaleparameter is correctly plumbed fromconfig_to the decoder constructor, consistent with the updatedOnlineTransducerGreedySearchNeMoDecodersignature.
72-79: LGTM!The templated constructor path correctly mirrors the primary constructor's handling of
temperature_scale.sherpa-onnx/csrc/online-transducer-greedy-search-decoder.cc (1)
142-160: LGTM!The per-token log probability export and full vocabulary distribution storage are correctly implemented with a single pass through temperature scaling and
LogSoftmax. This serves as the correct pattern that other decoders (like the NeMo decoder) should follow.flutter/sherpa_onnx/lib/src/offline_recognizer.dart (3)
610-613: LGTM!The JSON parsing for
token_log_probscorrectly handles null values with a safe default empty list and properly converts numeric values to doubles.
843-869: LGTM!The
getResultmethod correctly handlestokenLogProbsin both the error path (empty result) and the success path (parsed from JSON), ensuring consistency.
597-604: All instantiations ofOfflineRecognizerResultalready provide thetokenLogProbsparameter. The three usages in the file (fromJson factory at line 607, empty result at line 848, and parsedJson construction at line 862) all include this parameter. No breaking changes were introduced.flutter/sherpa_onnx/lib/src/online_recognizer.dart (2)
329-330: LGTM!The
ysProbsparameter is optional with a sensible default value, which avoids breaking existing code while supporting the new feature.
508-510: LGTM!The parsing of
ysProbsingetResultcorrectly handles null values with a safe default empty list.sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (2)
18-45: LGTM with a note on the static analysis hint.The
CalculateLogProbfunction correctly implements log-softmax using the log-sum-exp trick for numerical stability.Note: The static analysis tool flags line 44 as "Invalid std", but this appears to be a false positive.
std::logis a valid function from<cmath>(included on line 8).
204-208: LGTM!The addition of
IsMultiLingual()guard before accessingid2langis good defensive programming and prevents potential issues with non-multilingual models.sherpa-onnx/csrc/offline-recognizer-canary-impl.h (3)
60-64: LGTM!The early return for empty encoder output is good defensive programming and prevents potential crashes downstream.
146-152: LGTM!The
Convertmethod signature update to includetoken_log_probsis consistent with changes in other files, and the assignment is straightforward.
218-220: LGTM!The addition of an early return when the feature vector is empty prevents potential issues downstream and aligns with the empty check in
DecodeStream.Also applies to: 235-237
flutter/sherpa_onnx/lib/src/sherpa_onnx_bindings.dart (3)
680-689: LGTM!The
SherpaOnnxVocabLogProbsstruct is correctly defined for FFI with appropriate types. The flattened 2D array design (viaPointer<Float>) is the right approach for C interop.
1433-1445: LGTM!The typedefs for vocab log probs functions are correctly defined with appropriate Native and Dart versions.
2325-2341: LGTM!The initialization of vocab log probs bindings follows the existing pattern in the file and correctly uses
lookupFunctionto bind the native functions.
sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sherpa-onnx/csrc/offline-recognizer-canary-impl.h (1)
150-173: Critical: Token log probabilities misalignment when tokens are skipped.The method assigns
token_log_probswholesale at line 155, but then iterates through tokens and skips some (line 162continuewhen!symbol_table_.Contains(token_id)). This breaks the correspondence betweenr.tokensandr.token_log_probsindices. The same issue affectsvocab_log_probsalignment (set at line 131).Apply this diff to fix the alignment:
OfflineRecognitionResult Convert( const std::vector<int32_t> &tokens, const std::vector<float> &token_log_probs) const { OfflineRecognitionResult r; r.tokens.reserve(tokens.size()); - r.token_log_probs = token_log_probs; std::string text; for (size_t idx = 0; idx < tokens.size(); ++idx) { int32_t token_id = tokens[idx]; if (!symbol_table_.Contains(token_id)) { continue; } const auto &s = symbol_table_[token_id]; text += s; r.tokens.push_back(s); + + if (idx < token_log_probs.size()) { + r.token_log_probs.push_back(token_log_probs[idx]); + } } r.text = std::move(text); return r; }Note: You'll also need to filter
vocab_log_probssimilarly inDecodeStreambefore assigning tor.vocab_log_probsat line 131, or ensure tokens are never skipped by guaranteeing all generated tokens exist in the symbol table.
🧹 Nitpick comments (4)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
121-145: Log-softmax implementation is correct and addresses previous review feedback.The current implementation computes log-softmax once per iteration and then extracts the log probability for the selected token, which addresses the redundant computation concern raised in the previous review.
Optional refactoring opportunities:
Max finding consistency: Lines 123-128 manually find the max logit, while lines 101-102 use
std::max_element. Consider usingstd::max_elementhere as well for consistency:float max_logit = *std::max_element(current_logits, current_logits + vocab_size);Allocation efficiency:
full_vocab_probsis allocated in every loop iteration (line 122). For large vocabularies, consider pre-allocating it outside the loop:std::vector<float> full_vocab_probs(vocab_size); for (int32_t i = 0; i < num_possible_tokens; ++i) { // ... reuse full_vocab_probs in each iterationNote on static analysis: The Cppcheck warning on line 134 about "Invalid std" is a false positive—
std::logis a valid function from<cmath>.sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
65-96: Consider optimizing the duplicate log-softmax computation.The log-sum-exp implementation is correct and numerically stable. However, the log-softmax is computed twice: once while finding the max token (lines 68-83) and again while storing the full vocabulary distribution (lines 92-96).
Apply this diff to compute log-softmax once:
- // Compute log-softmax and find max token with confidence - float max_logit = *std::max_element(p, p + vocab_size); - - float sum_exp = 0.0f; - for (int32_t j = 0; j < vocab_size; ++j) { - sum_exp += std::exp(p[j] - max_logit); - } - float log_sum = max_logit + std::log(sum_exp); - - int32_t max_token_id = 0; - float max_log_prob = p[0] - log_sum; - - for (int32_t j = 1; j < vocab_size; ++j) { - float log_prob = p[j] - log_sum; - if (log_prob > max_log_prob) { - max_log_prob = log_prob; - max_token_id = j; - } - } - - if (max_token_id == eos) { - break; - } - tokens.push_back(max_token_id); - token_log_probs.push_back(max_log_prob); - - // Store full vocabulary distribution (already log-softmaxed) - std::vector<float> full_vocab_probs(vocab_size); - for (int32_t j = 0; j < vocab_size; ++j) { - full_vocab_probs[j] = p[j] - log_sum; - } - vocab_log_probs.push_back(std::move(full_vocab_probs)); + // Compute log-softmax once for both max selection and storage + float max_logit = *std::max_element(p, p + vocab_size); + + float sum_exp = 0.0f; + for (int32_t j = 0; j < vocab_size; ++j) { + sum_exp += std::exp(p[j] - max_logit); + } + float log_sum = max_logit + std::log(sum_exp); + + // Compute log-softmax for all tokens and find max in single pass + std::vector<float> full_vocab_probs(vocab_size); + int32_t max_token_id = 0; + float max_log_prob = p[0] - log_sum; + full_vocab_probs[0] = max_log_prob; + + for (int32_t j = 1; j < vocab_size; ++j) { + float log_prob = p[j] - log_sum; + full_vocab_probs[j] = log_prob; + if (log_prob > max_log_prob) { + max_log_prob = log_prob; + max_token_id = j; + } + } + + if (max_token_id == eos) { + break; + } + tokens.push_back(max_token_id); + token_log_probs.push_back(max_log_prob); + vocab_log_probs.push_back(std::move(full_vocab_probs));flutter/sherpa_onnx/lib/src/online_stream.dart (1)
53-66: Consider adding defensive validation for native values.The method reads
numTokensandvocabSizefrom native code and immediately uses them in nested loops without validation. If native code returns negative or unexpectedly large values, this could cause issues.Apply this diff to add defensive checks:
final vocabLogProbs = vocabPtr.ref; final numTokens = vocabLogProbs.numTokens; final vocabSize = vocabLogProbs.vocabSize; + +if (numTokens < 0 || vocabSize < 0 || numTokens > 10000 || vocabSize > 100000) { + destroyFunc(vocabPtr); + return null; +} final Map<String, List<double>> result = {};sherpa-onnx/csrc/offline-recognizer-canary-impl.h (1)
101-119: Remove unused variabletokens_generated.The variable
tokens_generatedis declared and incremented but never read. It serves no purpose in the current implementation.Apply this diff:
if (max_token_id != eos) { - int32_t tokens_generated = 0; for (int32_t i = decoder_input.size() + 1; i <= decoder_input.size() + num_tokens; ++i) { if (tokens.back() == eos) { break; } std::tie(logits, decoder_states) = RunDecoder(tokens.back(), i, std::move(decoder_states), View(&enc_states), View(&enc_mask)); std::vector<float> next_full_vocab_probs; auto [next_token_id, next_confidence] = GetMaxTokenIdWithConfidence(&logits, &next_full_vocab_probs); tokens.push_back(next_token_id); token_log_probs.push_back(next_confidence); vocab_log_probs.push_back(std::move(next_full_vocab_probs)); - tokens_generated++; } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
flutter/sherpa_onnx/lib/src/online_recognizer.dart(3 hunks)flutter/sherpa_onnx/lib/src/online_stream.dart(1 hunks)flutter/sherpa_onnx/pubspec.yaml(1 hunks)sherpa-onnx/csrc/offline-ctc-decoder.h(1 hunks)sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc(4 hunks)sherpa-onnx/csrc/offline-recognizer-canary-impl.h(9 hunks)sherpa-onnx/csrc/offline-recognizer-moonshine-impl.h(1 hunks)sherpa-onnx/csrc/offline-recognizer-whisper-impl.h(2 hunks)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc(3 hunks)sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- sherpa-onnx/csrc/offline-recognizer-moonshine-impl.h
- sherpa-onnx/csrc/offline-ctc-decoder.h
- flutter/sherpa_onnx/lib/src/online_recognizer.dart
- sherpa-onnx/csrc/offline-recognizer-whisper-impl.h
- flutter/sherpa_onnx/pubspec.yaml
🧰 Additional context used
🧬 Code graph analysis (4)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(122-122)
sherpa-onnx/csrc/offline-recognizer-canary-impl.h (3)
sherpa-onnx/csrc/offline-canary-model.cc (2)
tokens(83-117)tokens(83-85)sherpa-onnx/csrc/offline-recognizer-ctc-impl.h (2)
Convert(26-323)- `` (193-199)
sherpa-onnx/csrc/offline-stream.cc (2)
r(257-257)r(257-257)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)
sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc (3)
sherpa-onnx/csrc/online-stream.cc (8)
r(68-68)r(68-68)r(72-74)r(72-72)r(93-93)r(93-93)r(95-97)r(95-95)sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (1)
full_vocab_probs(122-122)sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (1)
full_vocab_probs(92-92)
🪛 Cppcheck (2.18.0)
sherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc
[warning] 72-72: Invalid std
(invalidFunctionArg)
sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc
[warning] 134-134: Invalid std
(invalidFunctionArg)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Release shared tts-OFF
- GitHub Check: Debug static tts-OFF
- GitHub Check: Release shared tts-ON
- GitHub Check: Debug shared-OFF tts-ON
- GitHub Check: Debug shared-ON tts-ON
- GitHub Check: Debug shared-OFF tts-OFF
- GitHub Check: Release shared-ON tts-OFF
- GitHub Check: Debug shared-ON tts-OFF
- GitHub Check: Release shared-ON tts-ON
- GitHub Check: Release shared-OFF tts-OFF
- GitHub Check: Release shared-OFF tts-ON
- GitHub Check: swift (macos-latest)
- GitHub Check: rknn shared ON
- GitHub Check: rknn shared OFF
- GitHub Check: ubuntu-24.04 3.13
- GitHub Check: ubuntu-24.04 3.12
- GitHub Check: ubuntu-24.04 3.11
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: ubuntu-24.04 3.8
- GitHub Check: ubuntu-24.04 3.10
🔇 Additional comments (16)
sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.cc (2)
90-99: Looks good: single-pass temperature + LogSoftmax, and full-vocab log-probs capture.
This also appears to address the previously-reported duplicate “export per-token log scores” block (only one remains now).
138-139: Call site threadingtemperature_scale_intoDecodeOne()is consistent.
Just ensuretemperature_scale_is validated at config boundaries too (so all decoders behave consistently).sherpa-onnx/csrc/offline-whisper-greedy-search-decoder.cc (5)
8-9: LGTM!The new includes are necessary for the log-softmax computation (
std::log,std::exp) and max logit initialization (std::numeric_limits).
92-103: LGTM!The variable renaming (
logits_tensor,current_token_id) improves code clarity, and the initialization logic correctly identifies the first token usingstd::max_element.
108-110: LGTM!The new data structures for tracking per-token log probabilities and full vocabulary distributions are properly declared and will be populated in the decoding loop.
147-176: LGTM!The loop continuation logic correctly:
- Creates the next token input tensor with the selected token
- Forwards through the decoder with proper state propagation
- Updates the offset and checks context boundaries
- Computes the next token for the subsequent iteration
178-189: LGTM!The result assembly correctly:
- Guards language lookup with
IsMultiLingual()check- Populates the predicted tokens
- Populates the new
token_log_probsandvocab_log_probsfields usingstd::movefor efficiencysherpa-onnx/csrc/offline-moonshine-greedy-search-decoder.cc (4)
8-8: LGTM!The
<cmath>include is necessary forstd::logandstd::expused in the log-softmax computation.
42-43: LGTM!The vector declarations appropriately support the PR's objective of exposing per-token log-probabilities and full vocabulary distributions.
119-120: LGTM!The result population correctly uses
std::moveto efficiently transfer the computed log-probability data to the result structure.
66-72: The cppcheck warning "Invalid std (invalidFunctionArg)" at line 72 is a false positive.The
std::log(sum_exp)call is valid—sum_expis guaranteed to be strictly positive as it's a sum ofstd::exp()values. This log-sum-exp pattern is used consistently throughout the codebase without issues. No action is required.sherpa-onnx/csrc/offline-recognizer-canary-impl.h (5)
60-64: LGTM! Good defensive guard.The early return for empty encoder output prevents potential crashes or undefined behavior in downstream processing.
79-92: LGTM! Vocab log probs now properly collected.This addresses the previous review feedback by capturing the full vocabulary distribution for the first token. The three parallel vectors (tokens, token_log_probs, vocab_log_probs) are correctly initialized.
123-128: LGTM! Proper cleanup of EOS token.Correctly removes the EOS token and its associated confidence and vocab distribution from all three parallel vectors, maintaining alignment.
175-213: LGTM! Correct log-softmax implementation.The method correctly computes log-softmax using the log-sum-exp trick for numerical stability. The optional full distribution capture and max token identification are both implemented correctly.
222-224: LGTM! Good defensive guard.The early return for empty feature vectors prevents potential downstream errors and is consistent with the empty encoder output guard in
DecodeStream.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dart-api-examples/non-streaming-asr/pubspec.yaml:
- Around line 13-14: The pubspec.yaml currently uses a local path override for
the sherpa_onnx dependency which will break consumers; replace the local path
entry (sherpa_onnx: path: ../../flutter/sherpa_onnx) with the published
versioned dependency (e.g., sherpa_onnx: ^X.Y.Z) or otherwise remove the path
override so the package resolves from pub.dev, and if the new API isn’t
published yet, add a TODO to revert this path change once the updated
sherpa_onnx is released.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dart-api-examples/non-streaming-asr/pubspec.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
dart-api-examples/non-streaming-asr/pubspec.yaml
CodeRabbit correctly identified that using a local path dependency would break for users cloning the repository. Reverted to ^1.12.20. Added TODO comment explaining that: - This example uses MedASR API not yet published to pub.dev - Users can temporarily use local path for testing - Or wait until the updated package is released The example will have errors until MedASR support is published.
- Updated all version numbers to 1.12.21 - Resolved conflicts in pubspec.yaml files - Kept helpful TODO comment for MedASR example - Kept descriptive comments for local development paths
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dart-api-examples/non-streaming-asr/pubspec.yaml:
- Around line 13-15: The TODO comment incorrectly references "MedASR API";
update the comment to refer to the vocab log-probs functionality introduced in
this PR (e.g., getVocabLogProbs(), SherpaOnnxVocabLogProbs) or clarify if MedASR
is actually intended; change the text so it consistently mentions the
"Vocabulary Log Probabilities API" and notes the same instruction about
reverting to a local path for local testing or waiting for the package release.
🧹 Nitpick comments (2)
sherpa-onnx/c-api/c-api.cc (2)
782-805: Consider validating consistent inner vector dimensions.The code assumes all inner vectors in
vocab_log_probshave the same size as the first entry. If this invariant is violated,std::copyat Line 798 could write out-of-bounds or leave uninitialized data.While decoders should always produce consistent dimensions, a defensive validation would prevent subtle memory corruption bugs.
♻️ Optional: Add size validation
auto vocab_probs = new SherpaOnnxVocabLogProbs; vocab_probs->num_tokens = result.vocab_log_probs.size(); vocab_probs->vocab_size = result.vocab_log_probs[0].size(); + // Validate all inner vectors have consistent size + for (int32_t i = 1; i < vocab_probs->num_tokens; ++i) { + if (static_cast<int32_t>(result.vocab_log_probs[i].size()) != vocab_probs->vocab_size) { + delete vocab_probs; + return nullptr; + } + } + // Flatten the 2D vector into a 1D array
807-831: Same inner vector size assumption applies here.This function has the same assumption as the online variant regarding consistent inner vector dimensions. Consider applying the same validation if added to
SherpaOnnxOnlineStreamGetVocabLogProbs.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
dart-api-examples/non-streaming-asr/pubspec.yamlflutter/sherpa_onnx/pubspec.yamlsherpa-onnx/c-api/c-api.ccsherpa-onnx/c-api/c-api.h
🚧 Files skipped from review as they are similar to previous changes (1)
- sherpa-onnx/c-api/c-api.h
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
flutter/sherpa_onnx/pubspec.yaml
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
flutter/sherpa_onnx/pubspec.yaml
🔇 Additional comments (2)
flutter/sherpa_onnx/pubspec.yaml (1)
34-56: LGTM! Helpful local development guidance.The added comments consistently document how to switch to local path dependencies for testing across all platform packages. This is a good developer experience improvement.
sherpa-onnx/c-api/c-api.cc (1)
833-839: LGTM!The destroy function correctly handles null input and properly frees both the inner array and the struct, matching the allocation pattern in the getter functions.
CodeRabbit correctly identified that the TODO comment should reference the vocab log probs API (getVocabLogProbs, SherpaOnnxVocabLogProbs) introduced in this PR, not other unrelated APIs.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dart-api-examples/non-streaming-asr/pubspec.yaml (1)
13-18: LGTM - Helpful TODO comment for unpublished API.The comment clearly explains the situation with the Vocabulary Log Probabilities API. Consider showing the exact local path syntax inline to make it more actionable for developers:
💡 Optional: Include the exact syntax for local testing
# TODO: This example directory includes examples that use the Vocabulary Log # Probabilities API (getVocabLogProbs, SherpaOnnxVocabLogProbs) which is not # yet published to pub.dev. - # Revert to local path (../../flutter/sherpa_onnx) for local testing, + # For local testing, replace the sherpa_onnx line below with: + # sherpa_onnx: + # path: ../../flutter/sherpa_onnx # or wait until the updated package is released. sherpa_onnx: ^1.12.21
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dart-api-examples/non-streaming-asr/pubspec.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Release shared-ON tts-OFF
- GitHub Check: Release shared-OFF tts-OFF
- GitHub Check: Debug shared-ON tts-ON
- GitHub Check: swift (macos-15-intel)
- GitHub Check: ubuntu-24.04 3.8
- GitHub Check: ubuntu-24.04 3.9
- GitHub Check: ubuntu-24.04 3.13
- GitHub Check: ubuntu-24.04 3.12
- GitHub Check: ubuntu-24.04 3.11
- GitHub Check: ubuntu-24.04 3.10
- GitHub Check: rknn shared ON
- GitHub Check: rknn shared OFF
|
I think its ready to merge. The failing tests don't seem to have anything to do with my PR (as far as I can tell), and I'm not sure why on this one instance the style_check didn't complete. |
Resolves merge conflicts in 5 files, preserving our custom per-token log probability additions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Captures the max log probability when emitting non-blank, non-repeat tokens in the CTC decoder. This enables real per-word confidence scores for SenseVoice and MedASR (NeMo CTC), completing log prob support across all decoder types in our fork. Also propagates token_log_probs through ConvertSenseVoiceResult() so SenseVoice results include confidence scores in the final OfflineRecognitionResult. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flutter/sherpa_onnx/lib/src/offline_recognizer.dart (1)
816-831:⚠️ Potential issue | 🟠 MajorFree native config on all failure paths in
OfflineRecognizerfactory.
convertConfig()allocates many native strings, but the current throw paths skipfreeConfig(c)(both when bindings are uninitialized and when native creation returnsnullptr), causing leaks on failure/retry paths.💡 Proposed fix
factory OfflineRecognizer(OfflineRecognizerConfig config) { - final c = convertConfig(config); - if (SherpaOnnxBindings.createOfflineRecognizer == null) { throw Exception("Please initialize sherpa-onnx first"); } + final c = convertConfig(config); - final ptr = SherpaOnnxBindings.createOfflineRecognizer?.call(c) ?? nullptr; - - if (ptr == nullptr) { - throw Exception( - "Failed to create offline recognizer. Please check your config"); - } - - freeConfig(c); - - return OfflineRecognizer._(ptr: ptr, config: config); + try { + final ptr = SherpaOnnxBindings.createOfflineRecognizer!.call(c); + if (ptr == nullptr) { + throw Exception( + "Failed to create offline recognizer. Please check your config"); + } + return OfflineRecognizer._(ptr: ptr, config: config); + } finally { + freeConfig(c); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flutter/sherpa_onnx/lib/src/offline_recognizer.dart` around lines 816 - 831, The factory OfflineRecognizer currently calls convertConfig() to allocate native resources but throws before calling freeConfig(c) on failure paths; wrap the native calls in a try/finally (or ensure freeConfig(c) is invoked before every throw) so that freeConfig(c) is always called regardless of errors; specifically, after obtaining c from convertConfig(), check SherpaOnnxBindings.createOfflineRecognizer and the ptr result inside a try block and call freeConfig(c) in the finally block (or call freeConfig(c) immediately before each Exception throw), referencing OfflineRecognizer, convertConfig, freeConfig, SherpaOnnxBindings.createOfflineRecognizer and ptr to locate the code to change.
🧹 Nitpick comments (1)
flutter/sherpa_onnx/lib/src/offline_recognizer.dart (1)
1060-1070: Prefer a single parsing path forOfflineRecognizerResult.
getResult()re-implements parsing logic already present inOfflineRecognizerResult.fromJson(). ReusingfromJson()avoids drift in defaults/conversions.♻️ Proposed refactor
- return OfflineRecognizerResult( - text: parsedJson['text'], - tokens: List<String>.from(parsedJson['tokens']), - tokenLogProbs: (parsedJson['token_log_probs'] as List?) - ?.map((e) => (e as num).toDouble()) - .toList() ?? - [], - timestamps: List<double>.from(parsedJson['timestamps']), - lang: parsedJson['lang'], - emotion: parsedJson['emotion'], - event: parsedJson['event']); + return OfflineRecognizerResult.fromJson( + parsedJson as Map<String, dynamic>, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flutter/sherpa_onnx/lib/src/offline_recognizer.dart` around lines 1060 - 1070, getResult() duplicates JSON parsing logic already implemented by OfflineRecognizerResult.fromJson(), risking drift; update getResult() to call OfflineRecognizerResult.fromJson(parsedJson) (or the static factory on OfflineRecognizerResult) and return that instance instead of reconstructing fields inline, ensuring you preserve any existing null/default handling in OfflineRecognizerResult.fromJson() and remove the manual List/String/num conversions in getResult().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@flutter/sherpa_onnx/lib/src/offline_recognizer.dart`:
- Around line 816-831: The factory OfflineRecognizer currently calls
convertConfig() to allocate native resources but throws before calling
freeConfig(c) on failure paths; wrap the native calls in a try/finally (or
ensure freeConfig(c) is invoked before every throw) so that freeConfig(c) is
always called regardless of errors; specifically, after obtaining c from
convertConfig(), check SherpaOnnxBindings.createOfflineRecognizer and the ptr
result inside a try block and call freeConfig(c) in the finally block (or call
freeConfig(c) immediately before each Exception throw), referencing
OfflineRecognizer, convertConfig, freeConfig,
SherpaOnnxBindings.createOfflineRecognizer and ptr to locate the code to change.
---
Nitpick comments:
In `@flutter/sherpa_onnx/lib/src/offline_recognizer.dart`:
- Around line 1060-1070: getResult() duplicates JSON parsing logic already
implemented by OfflineRecognizerResult.fromJson(), risking drift; update
getResult() to call OfflineRecognizerResult.fromJson(parsedJson) (or the static
factory on OfflineRecognizerResult) and return that instance instead of
reconstructing fields inline, ensuring you preserve any existing null/default
handling in OfflineRecognizerResult.fromJson() and remove the manual
List/String/num conversions in getResult().
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d16afe0b-5002-48fe-8770-c94d4fb5b1c6
📒 Files selected for processing (2)
flutter/sherpa_onnx/lib/src/offline_recognizer.dartflutter/sherpa_onnx/lib/src/sherpa_onnx_bindings.dart
- Move binding null-check before convertConfig() and wrap native call in try/finally so freeConfig() always runs on failure paths - Replace inline JSON parsing in getResult() with fromJson() to avoid drift and improve null safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert whitespace-only changes in sherpa_onnx_bindings.dart and offline_recognizer.dart, re-applying only actual logic changes - Fix tab characters in online_recognizer.dart - Strip trailing whitespace from C++ files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add defensive bounds validation to offline_stream getVocabLogProbs() matching the online version - Filter vocab_log_probs per-token in Whisper Convert() to stay aligned with tokens after timestamp token filtering - Propagate vocab_log_probs through online transducer Convert() so the field is no longer dead on OnlineRecognizerResult - Use double accumulator in Moonshine log-softmax to match Canary and avoid precision loss with large vocabularies - Restore direct returns in Canary ForwardEncoder/ForwardDecoder for guaranteed copy elision Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
During previous upstream merges, conflict resolution accidentally modified files unrelated to our vocab_log_probs feature (JNI, Java examples, Kotlin, CLI tools, and several C++ files). Reset all 33 files to match upstream/master so our branch only contains the per-token log probability additions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Restore blank line between variable declarations (whisper decoder) - Restore original ForwardDecoder call formatting (canary) - Restore original LOGE message text (canary) - Remove navigation comment that wasn't in upstream (whisper decoder.h) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Canary decoder: revert loop index to i=1 (matching upstream and the reference Python script). The position resets for the generation phase; starting at decoder_input.size()+1 fed wrong positional encodings to the model. 2. Standard transducer: guard against temperature_scale_ <= 0 to prevent division by zero, matching the NeMo decoder's existing guard. 3. Moonshine: filter token_log_probs and vocab_log_probs inside the token loop (like Canary/Whisper do) so they stay aligned with r.tokens when unknown symbols are skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Remove misleading std::move on const ref for vocab_log_probs in transducer Convert() — was silently copying anyway. 2. Restore tokens_buf handling in NeMo transducer Manager constructor that was lost during a merge. 3. Make OnlineRecognizerResult.fromJson null-safe for text, tokens, and timestamps (matching the offline version's pattern). 4. Wrap setConfig FFI call in try/finally to ensure freeConfig runs on exception (matching the factory constructor fix). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Restore hotwords_buf parameter in OnlineRecognizerConfig constructor - Restore hotwords_buf handling in transducer Manager constructor - Restore feature_dim from config instead of hardcoded 80 - Remove chunk_shift addition that wasn't in upstream - Revert variable rename (tokens_input back to tokens) in whisper decoder - Revert IsMultiLingual() guard addition in whisper decoder - Restore unconditional ys_log_probs serialization (matching upstream) - Restore removed comment in whisper timestamp tracking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the offline recognizer pattern by using the existing fromJson factory instead of manual field extraction, ensuring consistent null-safe parsing across both online and offline paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
I don't know what a style check is, but otherwise I think this PR is ready. |
csukuangfj
left a comment
There was a problem hiding this comment.
You have a lot of redundant code. Please try to use existing functions to simplify it.
| double sum_exp = 0.0; | ||
| for (int32_t j = 0; j < vocab_size; ++j) { | ||
| sum_exp += std::exp(p[j] - max_logit); | ||
| } | ||
| float log_sum = max_logit + static_cast<float>(std::log(sum_exp)); | ||
|
|
||
| // Compute log-softmax for all tokens and find max in single pass | ||
| std::vector<float> full_vocab_probs(vocab_size); | ||
| int32_t max_token_id = 0; | ||
| float max_log_prob = p[0] - log_sum; | ||
| full_vocab_probs[0] = max_log_prob; | ||
|
|
||
| for (int32_t j = 1; j < vocab_size; ++j) { | ||
| float log_prob = p[j] - log_sum; | ||
| full_vocab_probs[j] = log_prob; | ||
| if (log_prob > max_log_prob) { | ||
| max_log_prob = log_prob; | ||
| max_token_id = j; | ||
| } | ||
| } |
There was a problem hiding this comment.
Too complicated. Please use existing functions to simplify it.
| OfflineRecognitionResult empty_result; | ||
| s->SetResult(empty_result); |
There was a problem hiding this comment.
| OfflineRecognitionResult empty_result; | |
| s->SetResult(empty_result); |
Just return is enough. Every stream has a default initialized result.
Vocabulary Log Probabilities API
Summary
I needed this for fusion from two different models, and its worked ok for me so far. I think there was also another PR/Issues that was asking about this. I tried to add API for accessing full vocabulary log probability, to use with advanced confidence scoring, entropy-based uncertainty estimation, and hypothesis fusion algorithms (e.g., weighted ROVER). This exposes the complete probability distribution over the entire vocabulary for each decoded token position.
Changes
Core API (
sherpa-onnx/c-api/)New C API Functions:
SherpaOnnxOnlineStreamGetVocabLogProbs()- full vocab log probs online streamsSherpaOnnxOfflineStreamGetVocabLogProbs()- full vocab log probs offline streamsSherpaOnnxDestroyVocabLogProbs()- Memory management for vocab log probsNew Data Structure:
Model Support
Online Models:
Offline Models:
Implementation Details
Core Changes:
vocab_log_probsfield toOfflineRecognitionResultandOnlineRecognizerResultstructuresoffline-whisper-greedy-search-decoder.cc- Captures joiner output logitsoffline-moonshine-greedy-search-decoder.cc- Captures model output distributionsonline-transducer-greedy-search-decoder.cc- Captures joiner logits per frameonline-transducer-greedy-search-nemo-decoder.cc- NeMo-specific implementationoffline-recognizer-canary-impl.h- Canary model supportoffline-recognizer-ctc-impl.h- CTC model supportFlutter/Dart Bindings:
getVocabLogProbs()methods toOnlineStreamandOfflineStreamsherpa_onnx_bindings.dartJSON Serialization:
token_log_probsfield to JSON output (for backward compatibility)ys_log_probsfieldUse - I'm using it for Confidence-Weighted ROVER Fusion mostly (but can also be used for Entropy-Based Uncertainty Quantification).
Testing - I'm not sure how you run tests on this repo, I couldn't really find a test suite, so I'm not sure if I broke anything (in my local testing it didn't seem to, but I certainly didn't test it with all of the platforms and programming languages that you use)
Summary by CodeRabbit