Conversation
📝 WalkthroughWalkthroughThis PR introduces comprehensive Axera backend support for online speech recognition in Sherpa-ONNX. It adds Axera-specific implementations for the Zipformer transducer model, online streaming, greedy and beam-search decoders, and end-to-end recognition pipeline. Additionally, it extends the configuration system with Zipformer meta-configuration and augments ParseOptions to handle vector options. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Recognizer as OnlineRecognizerTransducerAxeraImpl
participant Stream as OnlineStreamAxera
participant Model as OnlineZipformerTransducerModelAxera
participant Decoder as OnlineTransducerDecoderAxera
Client->>Recognizer: CreateStream()
Recognizer->>Stream: new OnlineStreamAxera()
Stream-->>Recognizer: stream
Recognizer-->>Client: stream
Client->>Recognizer: DecodeStreams(stream)
Recognizer->>Recognizer: DecodeStream(stream)
Recognizer->>Model: RunEncoder(features, states)
Model-->>Recognizer: encoder_out, next_states
Recognizer->>Stream: SetZipformerEncoderStates(next_states)
loop For each encoder frame
Recognizer->>Decoder: Decode(encoder_out, result)
Decoder->>Model: RunJoiner(encoder_out, decoder_out)
Model-->>Decoder: joiner_logits
Decoder->>Decoder: select_token(logits)
alt token is non-blank
Decoder->>Model: RunDecoder(decoder_input)
Model-->>Decoder: new_decoder_out
end
Decoder-->>Recognizer: updated_result
end
Recognizer->>Stream: SetZipformerResult(result)
Client->>Recognizer: GetResult(stream)
Recognizer->>Stream: GetZipformerResult()
Stream-->>Recognizer: result
Recognizer->>Recognizer: Convert(result)
Recognizer-->>Client: OnlineRecognizerResult
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for Zipformer models on the Axera backend, which is a significant enhancement. A security audit identified a high-severity buffer overflow vulnerability in the model execution logic and two medium-severity denial-of-service (crash) vulnerabilities in the decoder logic, primarily caused by unsafe assumptions about tensor sizes and missing validation. Additionally, minor code quality issues such as a copy-paste error in a log message, a self-include, and dead code were noted. All original comments have been retained as they do not contradict the provided rules.
| size_t out0_elems = out0.nSize / sizeof(float); | ||
| std::vector<float> encoder_out(out0_elems); | ||
| std::memcpy(encoder_out.data(), out0_buf.pVirAddr, out0.nSize); |
There was a problem hiding this comment.
The code calculates the number of elements for a std::vector<float> by dividing the byte size (nSize) by sizeof(float). If nSize is not a multiple of sizeof(float), the resulting element count will be truncated due to integer division. The subsequent std::memcpy call uses the full nSize to copy data into the undersized vector, leading to a buffer overflow.
This occurs in three locations:
- In
RunEncoder(lines 154-156) - In
RunDecoder(lines 212-214) - In
RunJoiner(lines 251-253)
| #include <vector> | ||
|
|
||
| #include "sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h" | ||
| #include "sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h" |
|
|
||
| for (const auto &d : decoder_out) { | ||
| auto joiner_out = model->RunJoiner(p_encoder_out, d.data()); | ||
| LogSoftmax(joiner_out.data(), joiner_out.size()); |
| } | ||
| } | ||
|
|
||
| auto topk = TopkIndex(log_probs, max_active_paths_); |
There was a problem hiding this comment.
|
|
||
| std::unique_ptr<OnlineStream> CreateStream( | ||
| const std::string &hotwords) const override { | ||
| SHERPA_ONNX_LOGE("Hotwords for RKNN is not supported now."); |
There was a problem hiding this comment.
The log message mentions RKNN, but this file is for the Axera backend. This appears to be a copy-paste error and could be confusing. Please update it to mention Axera.
| SHERPA_ONNX_LOGE("Hotwords for RKNN is not supported now."); | |
| SHERPA_ONNX_LOGE("Hotwords for Axera is not supported now."); |
| } | ||
|
|
||
| // reset encoder states | ||
| // reinterpret_cast<OnlineStreamAxera*>(s)->SetZipformerEncoderStates(model_->GetEncoderInitStates()); |
| explicit OnlineTransducerModifiedBeamSearchDecoderAxera( | ||
| OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths, | ||
| int32_t unk_id = 2, float blank_penalty = 0.0) | ||
| : model_(model), | ||
| max_active_paths_(max_active_paths), | ||
| unk_id_(unk_id), | ||
| blank_penalty_(blank_penalty) {} |
There was a problem hiding this comment.
The blank_penalty parameter is not used in the implementation. It should be removed from the constructor. The corresponding member variable blank_penalty_ on line 37 should also be removed.
| explicit OnlineTransducerModifiedBeamSearchDecoderAxera( | |
| OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths, | |
| int32_t unk_id = 2, float blank_penalty = 0.0) | |
| : model_(model), | |
| max_active_paths_(max_active_paths), | |
| unk_id_(unk_id), | |
| blank_penalty_(blank_penalty) {} | |
| explicit OnlineTransducerModifiedBeamSearchDecoderAxera( | |
| OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths, | |
| int32_t unk_id = 2) | |
| : model_(model), | |
| max_active_paths_(max_active_paths), | |
| unk_id_(unk_id) {} |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sherpa-onnx/csrc/online-model-config.h (1)
58-83:⚠️ Potential issue | 🔴 CriticalRemove the
zipformer_meta(zipformer_meta)initializer on line 75.This constructor has no
zipformer_metaparameter, so attempting to initializezipformer_meta(zipformer_meta)reads an uninitialized member variable. Either add a constructor parameter forzipformer_metaor remove this initializer and let the default member initialization run.Minimal safe fix
: transducer(transducer), paraformer(paraformer), wenet_ctc(wenet_ctc), zipformer2_ctc(zipformer2_ctc), nemo_ctc(nemo_ctc), t_one_ctc(t_one_ctc), - zipformer_meta(zipformer_meta), provider_config(provider_config), tokens(tokens), num_threads(num_threads),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/online-model-config.h` around lines 58 - 83, The OnlineModelConfig constructor is initializing zipformer_meta with itself (zipformer_meta(zipformer_meta)) which reads an uninitialized member; fix by removing the zipformer_meta(zipformer_meta) entry from the member initializer list in OnlineModelConfig or alternatively add a corresponding constructor parameter (e.g., const ZipformerMetaType &zipformer_meta) and use that parameter in the initializer; update the initializer list for OnlineModelConfig accordingly and ensure the class member zipformer_meta is initialized either by the new parameter or by its default constructor.
🧹 Nitpick comments (3)
sherpa-onnx/csrc/vec-to-string.h (1)
10-22: Make this header self-contained.Line 22 exposes
int32_t, but this header never includes<cstdint>. Right now it only works when some earlier include happens to provide that typedef.Suggested fix
+#include <cstdint> `#include` <iomanip> `#include` <sstream> `#include` <string> `#include` <vector>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/vec-to-string.h` around lines 10 - 22, The header is not self-contained because VecToString uses int32_t in its signature without including <cstdint>; add an `#include` <cstdint> (so int32_t is defined) at the top of the header alongside the existing includes, or alternatively change the parameter to a standard type (e.g., int) and update VecToString accordingly; ensure the function signature for VecToString and any other references to int32_t compile without relying on transitive includes.sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h (1)
10-12: Remove the self-include on Line 11.Including the current header again is harmless because of the guard, but it is redundant and can mask a real missing dependency later. Please also double-check that removing it does not expose a missing direct include elsewhere.
♻️ Proposed cleanup
`#include` "sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h" -#include "sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h" `#include` "sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.h"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h` around lines 10 - 12, Remove the redundant self-include of "online-transducer-greedy-search-decoder-axera.h" from the top of the header that currently includes itself; keep the other headers (e.g., online-transducer-decoder-axera.h and online-zipformer-transducer-model-axera.h). After removing the self-include, rebuild and run the header's consumers to ensure no direct dependency was masked—if compilation fails, add the missing specific header(s) that the failing translation unit needs rather than re-adding the self-include.sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h (1)
151-154: Make unsupported hotwords explicit.Lines 151-154 log and return a plain stream anyway, so callers cannot distinguish “hotwords applied” from “hotwords ignored.” If Axera does not support contextual biasing yet, fail fast here or expose that capability explicitly; at minimum the message should say
Axerarather thanRKNN.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h` around lines 151 - 154, The CreateStream(const std::string &hotwords) const override currently logs the wrong backend ("RKNN") and returns a normal stream even when hotwords are provided; update the OnlineRecognizerTransducerAxeraImpl::CreateStream(hotwords) implementation to make unsupported hotwords explicit by either returning nullptr (or an empty unique_ptr) or failing fast (throwing a clear exception) when hotwords is non-empty, and change the log message to reference "Axera" and include the hotwords value for clarity; ensure callers can detect the unsupported case by checking for nullptr or catching the exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h`:
- Around line 220-230: The current code only copies trailing tokens into
r.tokens, but modified beam-search
(OnlineTransducerModifiedBeamSearchDecoderAxera::Decode) uses
OnlineTransducerDecoderResultAxera::hyps and previous_decoder_out2, so rebuild
and preserve the carried hypothesis state instead of just tokens: fetch
last_result via OnlineStreamAxera::GetZipformerResult(), trim its hyps to the
trailing context_size (preserving each hypothesis' tokens and scores), also
preserve previous_decoder_out2 (or trim/clone it consistently with the carried
hyps), assign these into r.hyps and r.previous_decoder_out2 (not just r.tokens),
then call OnlineStreamAxera::SetZipformerResult(std::move(r)) so Reset() and the
subsequent Decode resume from the preserved beam-search state.
In
`@sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc`:
- Around line 155-157: Decode() currently updates result->hyps,
result->frame_offset, and result->previous_decoder_out2 but leaves the
zipformer/context info stale; copy the winning hypothesis's zipformer result (at
least num_trailing_blanks and tokens) from the chosen entry inside cur into
result's Zipformer result so GetZipformerResult() and Reset() see the updated
values. Locate where result->hyps is set (the block with result->hyps =
std::move(cur); result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);) and after that assign
the winning hypothesis' zipformer result fields (the tokens vector and
num_trailing_blanks) into result (so GetZipformerResult().tokens and
.num_trailing_blanks reflect the selected hypothesis used by IsEndpoint and
Reset()).
In `@sherpa-onnx/csrc/online-model-config.h`:
- Line 27: The new OnlineZipformerMetaConfig field zipformer_meta was added to
OnlineModelConfig but not exposed/initialized in bindings; update the Python
binding to accept a zipformer_meta parameter in the constructor and add
def_readwrite("zipformer_meta", &OnlineModelConfig::zipformer_meta) so
Python-created configs include it, and update the JNI GetOnlineModelConfig()
implementation to populate the OnlineModelConfig::zipformer_meta
(construct/convert into OnlineZipformerMetaConfig) when building the native
config so Java-created configs also pass Axera transducer validation.
In `@sherpa-onnx/csrc/online-zipformer-meta-config.cc`:
- Around line 31-47: In OnlineZipformerMetaConfig::Validate(), in addition to
the existing emptiness and size-equality checks for encoder_dims,
attention_dims, num_encoder_layers, cnn_module_kernels, and left_context_len,
iterate over each stage index i and reject configurations where encoder_dims[i]
<= 0, attention_dims[i] <= 0, num_encoder_layers[i] <= 0, or
cnn_module_kernels[i] <= 0 (and ensure left_context_len[i] is not negative if
you allow zero), so invalid per-stage values like 384,-1 or 2,0 are caught
early; keep the existing scalar checks for T, decode_chunk_len, and context_size
as-is.
In `@sherpa-onnx/csrc/parse-options.cc`:
- Around line 579-585: The int-vector branch currently ignores whether the
original arg had an '=' and drops empty tokens, so inputs like bare "--foo" or
"1,,2" are silently normalized; change the logic in the branch handling
int_vec_map_ (the code that calls SplitStringToIntegers) to first require that
has_equal_sign is true and reject the option if not, and then split while
preserving empty tokens (call SplitStringToIntegers with
omit_empty_strings=false or otherwise detect empty tokens) and treat any empty
token or a failed parse as an error: log via SHERPA_ONNX_LOGE (including the
offending value) and exit(-1). Ensure you set *(int_vec_map_[key]) only after
validation succeeds.
---
Outside diff comments:
In `@sherpa-onnx/csrc/online-model-config.h`:
- Around line 58-83: The OnlineModelConfig constructor is initializing
zipformer_meta with itself (zipformer_meta(zipformer_meta)) which reads an
uninitialized member; fix by removing the zipformer_meta(zipformer_meta) entry
from the member initializer list in OnlineModelConfig or alternatively add a
corresponding constructor parameter (e.g., const ZipformerMetaType
&zipformer_meta) and use that parameter in the initializer; update the
initializer list for OnlineModelConfig accordingly and ensure the class member
zipformer_meta is initialized either by the new parameter or by its default
constructor.
---
Nitpick comments:
In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h`:
- Around line 151-154: The CreateStream(const std::string &hotwords) const
override currently logs the wrong backend ("RKNN") and returns a normal stream
even when hotwords are provided; update the
OnlineRecognizerTransducerAxeraImpl::CreateStream(hotwords) implementation to
make unsupported hotwords explicit by either returning nullptr (or an empty
unique_ptr) or failing fast (throwing a clear exception) when hotwords is
non-empty, and change the log message to reference "Axera" and include the
hotwords value for clarity; ensure callers can detect the unsupported case by
checking for nullptr or catching the exception.
In `@sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h`:
- Around line 10-12: Remove the redundant self-include of
"online-transducer-greedy-search-decoder-axera.h" from the top of the header
that currently includes itself; keep the other headers (e.g.,
online-transducer-decoder-axera.h and
online-zipformer-transducer-model-axera.h). After removing the self-include,
rebuild and run the header's consumers to ensure no direct dependency was
masked—if compilation fails, add the missing specific header(s) that the failing
translation unit needs rather than re-adding the self-include.
In `@sherpa-onnx/csrc/vec-to-string.h`:
- Around line 10-22: The header is not self-contained because VecToString uses
int32_t in its signature without including <cstdint>; add an `#include` <cstdint>
(so int32_t is defined) at the top of the header alongside the existing
includes, or alternatively change the parameter to a standard type (e.g., int)
and update VecToString accordingly; ensure the function signature for
VecToString and any other references to int32_t compile without relying on
transitive includes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ad223c9-557a-4911-ac47-31e6447c4412
📒 Files selected for processing (18)
sherpa-onnx/csrc/CMakeLists.txtsherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.hsherpa-onnx/csrc/axera/online-stream-axera.ccsherpa-onnx/csrc/axera/online-stream-axera.hsherpa-onnx/csrc/axera/online-transducer-decoder-axera.hsherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.ccsherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.hsherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.ccsherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.hsherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.ccsherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.hsherpa-onnx/csrc/online-model-config.ccsherpa-onnx/csrc/online-model-config.hsherpa-onnx/csrc/online-zipformer-meta-config.ccsherpa-onnx/csrc/online-zipformer-meta-config.hsherpa-onnx/csrc/parse-options.ccsherpa-onnx/csrc/parse-options.hsherpa-onnx/csrc/vec-to-string.h
| auto r = decoder_->GetEmptyResult(); | ||
| auto last_result = | ||
| reinterpret_cast<OnlineStreamAxera *>(s)->GetZipformerResult(); | ||
|
|
||
| // if last result is not empty, then | ||
| // preserve last tokens as the context for next result | ||
| if (static_cast<int32_t>(last_result.tokens.size()) > context_size) { | ||
| r.tokens = {last_result.tokens.end() - context_size, | ||
| last_result.tokens.end()}; | ||
| } | ||
| reinterpret_cast<OnlineStreamAxera *>(s)->SetZipformerResult(std::move(r)); |
There was a problem hiding this comment.
Preserve beam-search context in hyps, not just tokens.
Lines 226-229 only copy the trailing context into r.tokens, but sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoderAxera::Decode consumes OnlineTransducerDecoderResultAxera::hyps and OnlineTransducerDecoderResultAxera::previous_decoder_out2 instead of tokens in sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc, Lines 106-112. After Reset(), the modified beam-search path restarts from the blank hypothesis instead of the preserved context. Please rebuild the carried hypothesis for that decoding mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h` around
lines 220 - 230, The current code only copies trailing tokens into r.tokens, but
modified beam-search (OnlineTransducerModifiedBeamSearchDecoderAxera::Decode)
uses OnlineTransducerDecoderResultAxera::hyps and previous_decoder_out2, so
rebuild and preserve the carried hypothesis state instead of just tokens: fetch
last_result via OnlineStreamAxera::GetZipformerResult(), trim its hyps to the
trailing context_size (preserving each hypothesis' tokens and scores), also
preserve previous_decoder_out2 (or trim/clone it consistently with the carried
hyps), assign these into r.hyps and r.previous_decoder_out2 (not just r.tokens),
then call OnlineStreamAxera::SetZipformerResult(std::move(r)) so Reset() and the
subsequent Decode resume from the preserved beam-search state.
| class OnlineStreamAxera : public OnlineStream { | ||
| public: | ||
| explicit OnlineStreamAxera(const FeatureExtractorConfig &config = {}, | ||
| ContextGraphPtr context_graph = nullptr); | ||
|
|
||
| ~OnlineStreamAxera(); | ||
|
|
||
| void SetZipformerEncoderStates( | ||
| std::vector<std::vector<uint8_t>> states) const; | ||
|
|
||
| std::vector<std::vector<uint8_t>> &GetZipformerEncoderStates() const; | ||
|
|
||
| void SetZipformerResult(OnlineTransducerDecoderResultAxera r) const; | ||
|
|
||
| OnlineTransducerDecoderResultAxera &GetZipformerResult() const; | ||
|
|
||
| private: | ||
| class Impl; | ||
| std::unique_ptr<Impl> impl_; | ||
| }; |
There was a problem hiding this comment.
Reset() does not cover the new Axera stream state.
OnlineStream already exposes a concrete non-virtual Reset(), but this subclass adds separate encoder/result state that is outside that path. Reusing a stream after reset will carry stale encoder caches, tokens, timestamps, frame_offset, and cached decoder outputs into the next utterance. Because streams are returned as std::unique_ptr<OnlineStream>, a derived-only helper would not fix normal callers. Please make sure the public reset path reinitializes these Axera-specific members too.
| result->hyps = std::move(cur); | ||
| result->frame_offset += num_frames; | ||
| result->previous_decoder_out2 = std::move(decoder_out); |
There was a problem hiding this comment.
Synchronize the winning hypothesis back into result.
At the end of Decode(), only hyps, frame_offset, and previous_decoder_out2 are updated. sherpa_onnx::OnlineRecognizerTransducerAxeraImpl::IsEndpoint reads GetZipformerResult().num_trailing_blanks in sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h, Lines 195-200, and Reset() slices GetZipformerResult().tokens in Lines 221-229, so modified beam search currently leaves both values stale. That breaks endpoint detection and gives Reset() the wrong context window.
Suggested fix
- result->hyps = std::move(cur);
+ auto best = cur.GetMostProbable(true);
+ result->hyps = std::move(cur);
+ result->tokens = std::move(best.ys); // keep full context; GetResult() strips it later
+ result->timestamps = std::move(best.timestamps);
+ result->num_trailing_blanks = best.num_trailing_blanks;
result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result->hyps = std::move(cur); | |
| result->frame_offset += num_frames; | |
| result->previous_decoder_out2 = std::move(decoder_out); | |
| auto best = cur.GetMostProbable(true); | |
| result->hyps = std::move(cur); | |
| result->tokens = std::move(best.ys); // keep full context; GetResult() strips it later | |
| result->timestamps = std::move(best.timestamps); | |
| result->num_trailing_blanks = best.num_trailing_blanks; | |
| result->frame_offset += num_frames; | |
| result->previous_decoder_out2 = std::move(decoder_out); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc`
around lines 155 - 157, Decode() currently updates result->hyps,
result->frame_offset, and result->previous_decoder_out2 but leaves the
zipformer/context info stale; copy the winning hypothesis's zipformer result (at
least num_trailing_blanks and tokens) from the chosen entry inside cur into
result's Zipformer result so GetZipformerResult() and Reset() see the updated
values. Locate where result->hyps is set (the block with result->hyps =
std::move(cur); result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);) and after that assign
the winning hypothesis' zipformer result fields (the tokens vector and
num_trailing_blanks) into result (so GetZipformerResult().tokens and
.num_trailing_blanks reflect the selected hypothesis used by IsEndpoint and
Reset()).
| OnlineZipformer2CtcModelConfig zipformer2_ctc; | ||
| OnlineNeMoCtcModelConfig nemo_ctc; | ||
| OnlineToneCtcModelConfig t_one_ctc; | ||
| OnlineZipformerMetaConfig zipformer_meta; |
There was a problem hiding this comment.
Expose zipformer_meta in the language bindings too.
This new field becomes required for Axera transducer validation in this PR, but the provided Python binding snippet still omits it from the constructor and def_readwrite, and the JNI GetOnlineModelConfig() snippet never populates it. Axera configs created from Python/Java won't be able to pass validation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/online-model-config.h` at line 27, The new
OnlineZipformerMetaConfig field zipformer_meta was added to OnlineModelConfig
but not exposed/initialized in bindings; update the Python binding to accept a
zipformer_meta parameter in the constructor and add
def_readwrite("zipformer_meta", &OnlineModelConfig::zipformer_meta) so
Python-created configs include it, and update the JNI GetOnlineModelConfig()
implementation to populate the OnlineModelConfig::zipformer_meta
(construct/convert into OnlineZipformerMetaConfig) when building the native
config so Java-created configs also pass Axera transducer validation.
| bool OnlineZipformerMetaConfig::Validate() const { | ||
| if (encoder_dims.empty() || attention_dims.empty() || | ||
| num_encoder_layers.empty() || cnn_module_kernels.empty() || | ||
| left_context_len.empty()) { | ||
| return false; | ||
| } | ||
| if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) { | ||
| return false; | ||
| } | ||
|
|
||
| size_t n = encoder_dims.size(); | ||
| if (attention_dims.size() != n || num_encoder_layers.size() != n || | ||
| cnn_module_kernels.size() != n || left_context_len.size() != n) { | ||
| return false; | ||
| } | ||
|
|
||
| return true; |
There was a problem hiding this comment.
Reject invalid per-stage values here, not only mismatched lengths.
This accepts vectors like encoder_dims=384,-1 or num_encoder_layers=2,0 as long as the lists are non-empty and have matching sizes. That pushes a bad config into the Axera loader instead of failing fast in Validate().
Possible fix
bool OnlineZipformerMetaConfig::Validate() const {
if (encoder_dims.empty() || attention_dims.empty() ||
num_encoder_layers.empty() || cnn_module_kernels.empty() ||
left_context_len.empty()) {
return false;
}
if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) {
return false;
}
+
+ auto all_positive = [](const std::vector<int32_t> &v) {
+ for (auto x : v) {
+ if (x <= 0) return false;
+ }
+ return true;
+ };
+
+ auto all_non_negative = [](const std::vector<int32_t> &v) {
+ for (auto x : v) {
+ if (x < 0) return false;
+ }
+ return true;
+ };
size_t n = encoder_dims.size();
if (attention_dims.size() != n || num_encoder_layers.size() != n ||
cnn_module_kernels.size() != n || left_context_len.size() != n) {
return false;
}
+
+ if (!all_positive(encoder_dims) || !all_positive(attention_dims) ||
+ !all_positive(num_encoder_layers) || !all_positive(cnn_module_kernels) ||
+ !all_non_negative(left_context_len)) {
+ return false;
+ }
return true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool OnlineZipformerMetaConfig::Validate() const { | |
| if (encoder_dims.empty() || attention_dims.empty() || | |
| num_encoder_layers.empty() || cnn_module_kernels.empty() || | |
| left_context_len.empty()) { | |
| return false; | |
| } | |
| if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) { | |
| return false; | |
| } | |
| size_t n = encoder_dims.size(); | |
| if (attention_dims.size() != n || num_encoder_layers.size() != n || | |
| cnn_module_kernels.size() != n || left_context_len.size() != n) { | |
| return false; | |
| } | |
| return true; | |
| bool OnlineZipformerMetaConfig::Validate() const { | |
| if (encoder_dims.empty() || attention_dims.empty() || | |
| num_encoder_layers.empty() || cnn_module_kernels.empty() || | |
| left_context_len.empty()) { | |
| return false; | |
| } | |
| if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) { | |
| return false; | |
| } | |
| auto all_positive = [](const std::vector<int32_t> &v) { | |
| for (auto x : v) { | |
| if (x <= 0) return false; | |
| } | |
| return true; | |
| }; | |
| auto all_non_negative = [](const std::vector<int32_t> &v) { | |
| for (auto x : v) { | |
| if (x < 0) return false; | |
| } | |
| return true; | |
| }; | |
| size_t n = encoder_dims.size(); | |
| if (attention_dims.size() != n || num_encoder_layers.size() != n || | |
| cnn_module_kernels.size() != n || left_context_len.size() != n) { | |
| return false; | |
| } | |
| if (!all_positive(encoder_dims) || !all_positive(attention_dims) || | |
| !all_positive(num_encoder_layers) || !all_positive(cnn_module_kernels) || | |
| !all_non_negative(left_context_len)) { | |
| return false; | |
| } | |
| return true; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/online-zipformer-meta-config.cc` around lines 31 - 47, In
OnlineZipformerMetaConfig::Validate(), in addition to the existing emptiness and
size-equality checks for encoder_dims, attention_dims, num_encoder_layers,
cnn_module_kernels, and left_context_len, iterate over each stage index i and
reject configurations where encoder_dims[i] <= 0, attention_dims[i] <= 0,
num_encoder_layers[i] <= 0, or cnn_module_kernels[i] <= 0 (and ensure
left_context_len[i] is not negative if you allow zero), so invalid per-stage
values like 384,-1 or 2,0 are caught early; keep the existing scalar checks for
T, decode_chunk_len, and context_size as-is.
| } else if (int_vec_map_.end() != int_vec_map_.find(key)) { | ||
| std::vector<int32_t> v; | ||
| if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) { | ||
| SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str()); | ||
| exit(-1); | ||
| } | ||
| *(int_vec_map_[key]) = std::move(v); |
There was a problem hiding this comment.
Reject malformed int-vector syntax instead of silently normalizing it.
This branch accepts bare --foo as an empty vector and collapses 1,,2 / 1,2, because has_equal_sign is ignored and omit_empty_strings=true. For topology options like the Zipformer stage lists, that turns typos into different configs instead of surfacing an error.
Suggested fix
} else if (int_vec_map_.end() != int_vec_map_.find(key)) {
+ if (!has_equal_sign) {
+ SHERPA_ONNX_LOGE("Invalid option --%s (option format is --x=y).",
+ key.c_str());
+ exit(-1);
+ }
std::vector<int32_t> v;
- if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) {
+ if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/false, &v)) {
SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str());
exit(-1);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if (int_vec_map_.end() != int_vec_map_.find(key)) { | |
| std::vector<int32_t> v; | |
| if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) { | |
| SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str()); | |
| exit(-1); | |
| } | |
| *(int_vec_map_[key]) = std::move(v); | |
| } else if (int_vec_map_.end() != int_vec_map_.find(key)) { | |
| if (!has_equal_sign) { | |
| SHERPA_ONNX_LOGE("Invalid option --%s (option format is --x=y).", | |
| key.c_str()); | |
| exit(-1); | |
| } | |
| std::vector<int32_t> v; | |
| if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/false, &v)) { | |
| SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str()); | |
| exit(-1); | |
| } | |
| *(int_vec_map_[key]) = std::move(v); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/parse-options.cc` around lines 579 - 585, The int-vector
branch currently ignores whether the original arg had an '=' and drops empty
tokens, so inputs like bare "--foo" or "1,,2" are silently normalized; change
the logic in the branch handling int_vec_map_ (the code that calls
SplitStringToIntegers) to first require that has_equal_sign is true and reject
the option if not, and then split while preserving empty tokens (call
SplitStringToIntegers with omit_empty_strings=false or otherwise detect empty
tokens) and treat any empty token or a failed parse as an error: log via
SHERPA_ONNX_LOGE (including the offending value) and exit(-1). Ensure you set
*(int_vec_map_[key]) only after validation succeeds.
Summary by CodeRabbit
New Features
Build/Infrastructure