Skip to content

zipformer supported axera backend#3271

Open
Abandon-ht wants to merge 1 commit intok2-fsa:masterfrom
Abandon-ht:zipformer_axera
Open

zipformer supported axera backend#3271
Abandon-ht wants to merge 1 commit intok2-fsa:masterfrom
Abandon-ht:zipformer_axera

Conversation

@Abandon-ht
Copy link
Contributor

@Abandon-ht Abandon-ht commented Mar 9, 2026

Summary by CodeRabbit

  • New Features

    • Added Axera platform support for online speech recognition with configurable decoding strategies (greedy search and beam search algorithms)
    • Implemented online stream management with encoder state tracking for Axera-based models
    • Added Zipformer model configuration and metadata support for enhanced model flexibility
  • Build/Infrastructure

    • Extended CMakeLists.txt to include new Axera-specific source files
    • Enhanced configuration system to support vector-based parameter registration

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 9, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive Axera backend support for online speech recognition in Sherpa-ONNX. It adds Axera-specific implementations for the Zipformer transducer model, online streaming, greedy and beam-search decoders, and end-to-end recognition pipeline. Additionally, it extends the configuration system with Zipformer meta-configuration and augments ParseOptions to handle vector options.

Changes

Cohort / File(s) Summary
Build Configuration
sherpa-onnx/csrc/CMakeLists.txt
Added online-zipformer-meta-config.cc to core sources and four Axera-specific sources (online-stream-axera.cc, online-transducer-greedy-search-decoder-axera.cc, online-transducer-modified-beam-search-decoder-axera.cc, online-zipformer-transducer-model-axera.cc) under AXERA feature flag.
Axera Core Model
sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.h, ...axera.cc
Introduced OnlineZipformerTransducerModelAxera class with encoder, decoder, and joiner inference APIs; manages AX engine handles and IO data; supports both direct config and manager-assisted construction.
Axera Stream
sherpa-onnx/csrc/axera/online-stream-axera.h, ...axera.cc
Added OnlineStreamAxera extending OnlineStream with PIMPL pattern; provides APIs for managing encoder states and Zipformer decoder results.
Axera Decoder Interface
sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h
Introduced abstract OnlineTransducerDecoderAxera interface and OnlineTransducerDecoderResultAxera result structure with frame offset, tokens, timestamps, and decoder state tracking.
Axera Greedy Search Decoder
sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h, ...axera.cc
Implemented greedy decoding strategy with blank-penalty support; decodes encoder frames frame-by-frame, emitting tokens when non-blank, and updates decoder state on new tokens.
Axera Modified Beam Search Decoder
sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.h, ...axera.cc
Implemented beam-search decoder managing multiple active hypotheses; iterates frames, computes joiner logits, extends hypotheses via top-k, and maintains decoder outputs across frames for efficiency.
Axera Recognizer
sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h
Added OnlineRecognizerTransducerAxeraImpl and Convert helper; coordinates stream creation, decoding, result extraction, and provides selectable greedy/beam-search strategies with symbol table integration.
Zipformer Meta-Configuration
sherpa-onnx/csrc/online-zipformer-meta-config.h, ...meta-config.cc
Introduced OnlineZipformerMetaConfig struct holding encoder/attention dimensions, layer counts, CNN kernels, context lengths, and transducer parameters; provides registration, validation, and string representation.
Configuration Integration
sherpa-onnx/csrc/online-model-config.h, ...model-config.cc
Added zipformer_meta field to OnlineModelConfig; integrated meta-validation for Axera transducers; enforces complete metadata requirements for Axera provider.
Option Parsing Extensions
sherpa-onnx/csrc/parse-options.h, ...options.cc
Extended ParseOptions to support vector<int32_t> registration, storage, and comma-separated parsing; added new public Register overload and private RegisterSpecific overload.
Vector Stringification Utility
sherpa-onnx/csrc/vec-to-string.h
Introduced VecToString template function for formatting vectors as comma-separated bracketed strings with optional precision; includes specialization for std::string vectors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Recognizer as OnlineRecognizerTransducerAxeraImpl
    participant Stream as OnlineStreamAxera
    participant Model as OnlineZipformerTransducerModelAxera
    participant Decoder as OnlineTransducerDecoderAxera
    
    Client->>Recognizer: CreateStream()
    Recognizer->>Stream: new OnlineStreamAxera()
    Stream-->>Recognizer: stream
    Recognizer-->>Client: stream
    
    Client->>Recognizer: DecodeStreams(stream)
    Recognizer->>Recognizer: DecodeStream(stream)
    
    Recognizer->>Model: RunEncoder(features, states)
    Model-->>Recognizer: encoder_out, next_states
    
    Recognizer->>Stream: SetZipformerEncoderStates(next_states)
    
    loop For each encoder frame
        Recognizer->>Decoder: Decode(encoder_out, result)
        Decoder->>Model: RunJoiner(encoder_out, decoder_out)
        Model-->>Decoder: joiner_logits
        Decoder->>Decoder: select_token(logits)
        
        alt token is non-blank
            Decoder->>Model: RunDecoder(decoder_input)
            Model-->>Decoder: new_decoder_out
        end
        
        Decoder-->>Recognizer: updated_result
    end
    
    Recognizer->>Stream: SetZipformerResult(result)
    
    Client->>Recognizer: GetResult(stream)
    Recognizer->>Stream: GetZipformerResult()
    Stream-->>Recognizer: result
    Recognizer->>Recognizer: Convert(result)
    Recognizer-->>Client: OnlineRecognizerResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • k2-fsa/sherpa-onnx#2849: Adds complementary Axera backend infrastructure components for model loading and initialization, building the foundation this PR extends with online recognition capabilities.

Suggested labels

size:XL

Suggested reviewers

  • csukuangfj

Poem

🐰 Hop-hop, the Axera forest grows!
Streams and decoders in neat rows,
Zipformer whispers through the code,
Recognition blooms along the road,
Beam-search dreams and tokens hop,
What a wonderful change, don't stop! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding Zipformer support for the AXERA backend. It directly reflects the core objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the sherpa-onnx project's hardware compatibility by integrating a new backend for Axera platforms. The changes introduce a dedicated set of C++ classes and configurations that enable online speech recognition using Zipformer transducer models on Axera hardware. This integration ensures that models can leverage Axera's specialized AI accelerators, potentially leading to improved performance and efficiency for embedded and edge computing applications. The update also includes robust validation for Axera-specific model configurations.

Highlights

  • Axera Backend Integration: Introduced comprehensive support for the Axera hardware backend, enabling Zipformer transducer models to run efficiently on Axera platforms.
  • New Model and Decoder Implementations: Added Axera-specific implementations for online streams, transducer decoders (greedy search and modified beam search), and the core Zipformer transducer model, tailored for Axera's execution environment.
  • Dynamic Model Metadata Configuration: Implemented OnlineZipformerMetaConfig to allow detailed specification of Zipformer model parameters (e.g., encoder dimensions, chunk sizes) via command-line options, crucial for Axera model compatibility.
  • Enhanced Command-Line Argument Parsing: Extended the ParseOptions utility to support parsing std::vector<int32_t> from command-line arguments, facilitating flexible configuration of model metadata.
Changelog
  • sherpa-onnx/csrc/CMakeLists.txt
    • Added new source files for Axera backend components, including online stream, decoder, and Zipformer model implementations.
    • Included online-zipformer-meta-config.cc for Zipformer model metadata handling.
  • sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h
    • Implemented the OnlineRecognizerTransducerAxeraImpl class, providing the core logic for online recognition with Axera-specific Zipformer transducer models.
  • sherpa-onnx/csrc/axera/online-stream-axera.cc
    • Implemented the OnlineStreamAxera class, managing encoder states and decoder results for Axera-specific online streams.
  • sherpa-onnx/csrc/axera/online-stream-axera.h
    • Defined the OnlineStreamAxera class, inheriting from OnlineStream and adding methods to manage Zipformer encoder states and decoder results for the Axera backend.
  • sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h
    • Defined the base OnlineTransducerDecoderAxera class and OnlineTransducerDecoderResultAxera struct, providing an interface for Axera-specific transducer decoders and their results.
  • sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.cc
    • Implemented the greedy search decoding strategy for Axera-specific transducer models.
  • sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h
    • Declared the OnlineTransducerGreedySearchDecoderAxera class for greedy search decoding on Axera.
  • sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc
    • Implemented the modified beam search decoding strategy for Axera-specific transducer models.
  • sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.h
    • Declared the OnlineTransducerModifiedBeamSearchDecoderAxera class for modified beam search decoding on Axera.
  • sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.cc
    • Implemented the OnlineZipformerTransducerModelAxera class, handling model loading, encoder, decoder, and joiner execution for Axera.
  • sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.h
    • Declared the OnlineZipformerTransducerModelAxera class, providing an interface for Axera-specific Zipformer transducer models.
  • sherpa-onnx/csrc/online-model-config.cc
    • Updated to register OnlineZipformerMetaConfig.
    • Added validation logic for the Axera provider, ensuring correct model file types and meta-configuration.
  • sherpa-onnx/csrc/online-model-config.h
    • Included online-zipformer-meta-config.h.
    • Added OnlineZipformerMetaConfig as a member to OnlineModelConfig.
  • sherpa-onnx/csrc/online-zipformer-meta-config.cc
    • Implemented OnlineZipformerMetaConfig for registering and validating Zipformer model metadata.
  • sherpa-onnx/csrc/online-zipformer-meta-config.h
    • Defined the OnlineZipformerMetaConfig struct to hold Zipformer model metadata.
  • sherpa-onnx/csrc/parse-options.cc
    • Extended ParseOptions to support registering and parsing std::vector<int32_t> command-line arguments.
  • sherpa-onnx/csrc/parse-options.h
    • Added declarations for std::vector<int32_t> support in ParseOptions.
  • sherpa-onnx/csrc/vec-to-string.h
    • Added a utility header for converting std::vector<T> to string representations.
Activity
  • The pull request was initiated by Abandon-ht to introduce Axera backend support.
  • New C++ files were added to implement Axera-specific online speech recognition components.
  • The build system (CMakeLists.txt) was modified to include the newly added Axera source files.
  • Configuration parsing logic was updated to handle Axera-specific model metadata and validate model paths.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Zipformer models on the Axera backend, which is a significant enhancement. A security audit identified a high-severity buffer overflow vulnerability in the model execution logic and two medium-severity denial-of-service (crash) vulnerabilities in the decoder logic, primarily caused by unsafe assumptions about tensor sizes and missing validation. Additionally, minor code quality issues such as a copy-paste error in a log message, a self-include, and dead code were noted. All original comments have been retained as they do not contradict the provided rules.

Comment on lines +154 to +156
size_t out0_elems = out0.nSize / sizeof(float);
std::vector<float> encoder_out(out0_elems);
std::memcpy(encoder_out.data(), out0_buf.pVirAddr, out0.nSize);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The code calculates the number of elements for a std::vector<float> by dividing the byte size (nSize) by sizeof(float). If nSize is not a multiple of sizeof(float), the resulting element count will be truncated due to integer division. The subsequent std::memcpy call uses the full nSize to copy data into the undersized vector, leading to a buffer overflow.

This occurs in three locations:

  1. In RunEncoder (lines 154-156)
  2. In RunDecoder (lines 212-214)
  3. In RunJoiner (lines 251-253)

#include <vector>

#include "sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h"
#include "sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This is a self-include, which is unnecessary and can sometimes cause issues with include guards or compilation cycles. It should be removed.


for (const auto &d : decoder_out) {
auto joiner_out = model->RunJoiner(p_encoder_out, d.data());
LogSoftmax(joiner_out.data(), joiner_out.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

If the model reports a vocab_size of 0, GetJoinerOutLogSoftmax calls LogSoftmax with an empty buffer. Inside LogSoftmax (in math.h), std::max_element is called on the empty range and its result is dereferenced, leading to a crash (DoS).

}
}

auto topk = TopkIndex(log_probs, max_active_paths_);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The call to TopkIndex can lead to a crash if log_probs is empty. This can happen if max_active_paths_ is set to 0 or if the hypothesis list becomes empty. Inside TopkIndex (in math.h), vec[0] is accessed without checking if vec is empty, leading to a crash (DoS).


std::unique_ptr<OnlineStream> CreateStream(
const std::string &hotwords) const override {
SHERPA_ONNX_LOGE("Hotwords for RKNN is not supported now.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message mentions RKNN, but this file is for the Axera backend. This appears to be a copy-paste error and could be confusing. Please update it to mention Axera.

Suggested change
SHERPA_ONNX_LOGE("Hotwords for RKNN is not supported now.");
SHERPA_ONNX_LOGE("Hotwords for Axera is not supported now.");

}

// reset encoder states
// reinterpret_cast<OnlineStreamAxera*>(s)->SetZipformerEncoderStates(model_->GetEncoderInitStates());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out line in the Reset method is confusing. If resetting encoder states is not needed here, this line should be removed to improve code clarity.

Comment on lines +18 to +24
explicit OnlineTransducerModifiedBeamSearchDecoderAxera(
OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths,
int32_t unk_id = 2, float blank_penalty = 0.0)
: model_(model),
max_active_paths_(max_active_paths),
unk_id_(unk_id),
blank_penalty_(blank_penalty) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The blank_penalty parameter is not used in the implementation. It should be removed from the constructor. The corresponding member variable blank_penalty_ on line 37 should also be removed.

Suggested change
explicit OnlineTransducerModifiedBeamSearchDecoderAxera(
OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths,
int32_t unk_id = 2, float blank_penalty = 0.0)
: model_(model),
max_active_paths_(max_active_paths),
unk_id_(unk_id),
blank_penalty_(blank_penalty) {}
explicit OnlineTransducerModifiedBeamSearchDecoderAxera(
OnlineZipformerTransducerModelAxera *model, int32_t max_active_paths,
int32_t unk_id = 2)
: model_(model),
max_active_paths_(max_active_paths),
unk_id_(unk_id) {}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sherpa-onnx/csrc/online-model-config.h (1)

58-83: ⚠️ Potential issue | 🔴 Critical

Remove the zipformer_meta(zipformer_meta) initializer on line 75.

This constructor has no zipformer_meta parameter, so attempting to initialize zipformer_meta(zipformer_meta) reads an uninitialized member variable. Either add a constructor parameter for zipformer_meta or remove this initializer and let the default member initialization run.

Minimal safe fix
       : transducer(transducer),
         paraformer(paraformer),
         wenet_ctc(wenet_ctc),
         zipformer2_ctc(zipformer2_ctc),
         nemo_ctc(nemo_ctc),
         t_one_ctc(t_one_ctc),
-        zipformer_meta(zipformer_meta),
         provider_config(provider_config),
         tokens(tokens),
         num_threads(num_threads),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-model-config.h` around lines 58 - 83, The
OnlineModelConfig constructor is initializing zipformer_meta with itself
(zipformer_meta(zipformer_meta)) which reads an uninitialized member; fix by
removing the zipformer_meta(zipformer_meta) entry from the member initializer
list in OnlineModelConfig or alternatively add a corresponding constructor
parameter (e.g., const ZipformerMetaType &zipformer_meta) and use that parameter
in the initializer; update the initializer list for OnlineModelConfig
accordingly and ensure the class member zipformer_meta is initialized either by
the new parameter or by its default constructor.
🧹 Nitpick comments (3)
sherpa-onnx/csrc/vec-to-string.h (1)

10-22: Make this header self-contained.

Line 22 exposes int32_t, but this header never includes <cstdint>. Right now it only works when some earlier include happens to provide that typedef.

Suggested fix
+#include <cstdint>
 `#include` <iomanip>
 `#include` <sstream>
 `#include` <string>
 `#include` <vector>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/vec-to-string.h` around lines 10 - 22, The header is not
self-contained because VecToString uses int32_t in its signature without
including <cstdint>; add an `#include` <cstdint> (so int32_t is defined) at the
top of the header alongside the existing includes, or alternatively change the
parameter to a standard type (e.g., int) and update VecToString accordingly;
ensure the function signature for VecToString and any other references to
int32_t compile without relying on transitive includes.
sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h (1)

10-12: Remove the self-include on Line 11.

Including the current header again is harmless because of the guard, but it is redundant and can mask a real missing dependency later. Please also double-check that removing it does not expose a missing direct include elsewhere.

♻️ Proposed cleanup
 `#include` "sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h"
-#include "sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h"
 `#include` "sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.h"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h`
around lines 10 - 12, Remove the redundant self-include of
"online-transducer-greedy-search-decoder-axera.h" from the top of the header
that currently includes itself; keep the other headers (e.g.,
online-transducer-decoder-axera.h and
online-zipformer-transducer-model-axera.h). After removing the self-include,
rebuild and run the header's consumers to ensure no direct dependency was
masked—if compilation fails, add the missing specific header(s) that the failing
translation unit needs rather than re-adding the self-include.
sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h (1)

151-154: Make unsupported hotwords explicit.

Lines 151-154 log and return a plain stream anyway, so callers cannot distinguish “hotwords applied” from “hotwords ignored.” If Axera does not support contextual biasing yet, fail fast here or expose that capability explicitly; at minimum the message should say Axera rather than RKNN.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h` around
lines 151 - 154, The CreateStream(const std::string &hotwords) const override
currently logs the wrong backend ("RKNN") and returns a normal stream even when
hotwords are provided; update the
OnlineRecognizerTransducerAxeraImpl::CreateStream(hotwords) implementation to
make unsupported hotwords explicit by either returning nullptr (or an empty
unique_ptr) or failing fast (throwing a clear exception) when hotwords is
non-empty, and change the log message to reference "Axera" and include the
hotwords value for clarity; ensure callers can detect the unsupported case by
checking for nullptr or catching the exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h`:
- Around line 220-230: The current code only copies trailing tokens into
r.tokens, but modified beam-search
(OnlineTransducerModifiedBeamSearchDecoderAxera::Decode) uses
OnlineTransducerDecoderResultAxera::hyps and previous_decoder_out2, so rebuild
and preserve the carried hypothesis state instead of just tokens: fetch
last_result via OnlineStreamAxera::GetZipformerResult(), trim its hyps to the
trailing context_size (preserving each hypothesis' tokens and scores), also
preserve previous_decoder_out2 (or trim/clone it consistently with the carried
hyps), assign these into r.hyps and r.previous_decoder_out2 (not just r.tokens),
then call OnlineStreamAxera::SetZipformerResult(std::move(r)) so Reset() and the
subsequent Decode resume from the preserved beam-search state.

In
`@sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc`:
- Around line 155-157: Decode() currently updates result->hyps,
result->frame_offset, and result->previous_decoder_out2 but leaves the
zipformer/context info stale; copy the winning hypothesis's zipformer result (at
least num_trailing_blanks and tokens) from the chosen entry inside cur into
result's Zipformer result so GetZipformerResult() and Reset() see the updated
values. Locate where result->hyps is set (the block with result->hyps =
std::move(cur); result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);) and after that assign
the winning hypothesis' zipformer result fields (the tokens vector and
num_trailing_blanks) into result (so GetZipformerResult().tokens and
.num_trailing_blanks reflect the selected hypothesis used by IsEndpoint and
Reset()).

In `@sherpa-onnx/csrc/online-model-config.h`:
- Line 27: The new OnlineZipformerMetaConfig field zipformer_meta was added to
OnlineModelConfig but not exposed/initialized in bindings; update the Python
binding to accept a zipformer_meta parameter in the constructor and add
def_readwrite("zipformer_meta", &OnlineModelConfig::zipformer_meta) so
Python-created configs include it, and update the JNI GetOnlineModelConfig()
implementation to populate the OnlineModelConfig::zipformer_meta
(construct/convert into OnlineZipformerMetaConfig) when building the native
config so Java-created configs also pass Axera transducer validation.

In `@sherpa-onnx/csrc/online-zipformer-meta-config.cc`:
- Around line 31-47: In OnlineZipformerMetaConfig::Validate(), in addition to
the existing emptiness and size-equality checks for encoder_dims,
attention_dims, num_encoder_layers, cnn_module_kernels, and left_context_len,
iterate over each stage index i and reject configurations where encoder_dims[i]
<= 0, attention_dims[i] <= 0, num_encoder_layers[i] <= 0, or
cnn_module_kernels[i] <= 0 (and ensure left_context_len[i] is not negative if
you allow zero), so invalid per-stage values like 384,-1 or 2,0 are caught
early; keep the existing scalar checks for T, decode_chunk_len, and context_size
as-is.

In `@sherpa-onnx/csrc/parse-options.cc`:
- Around line 579-585: The int-vector branch currently ignores whether the
original arg had an '=' and drops empty tokens, so inputs like bare "--foo" or
"1,,2" are silently normalized; change the logic in the branch handling
int_vec_map_ (the code that calls SplitStringToIntegers) to first require that
has_equal_sign is true and reject the option if not, and then split while
preserving empty tokens (call SplitStringToIntegers with
omit_empty_strings=false or otherwise detect empty tokens) and treat any empty
token or a failed parse as an error: log via SHERPA_ONNX_LOGE (including the
offending value) and exit(-1). Ensure you set *(int_vec_map_[key]) only after
validation succeeds.

---

Outside diff comments:
In `@sherpa-onnx/csrc/online-model-config.h`:
- Around line 58-83: The OnlineModelConfig constructor is initializing
zipformer_meta with itself (zipformer_meta(zipformer_meta)) which reads an
uninitialized member; fix by removing the zipformer_meta(zipformer_meta) entry
from the member initializer list in OnlineModelConfig or alternatively add a
corresponding constructor parameter (e.g., const ZipformerMetaType
&zipformer_meta) and use that parameter in the initializer; update the
initializer list for OnlineModelConfig accordingly and ensure the class member
zipformer_meta is initialized either by the new parameter or by its default
constructor.

---

Nitpick comments:
In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h`:
- Around line 151-154: The CreateStream(const std::string &hotwords) const
override currently logs the wrong backend ("RKNN") and returns a normal stream
even when hotwords are provided; update the
OnlineRecognizerTransducerAxeraImpl::CreateStream(hotwords) implementation to
make unsupported hotwords explicit by either returning nullptr (or an empty
unique_ptr) or failing fast (throwing a clear exception) when hotwords is
non-empty, and change the log message to reference "Axera" and include the
hotwords value for clarity; ensure callers can detect the unsupported case by
checking for nullptr or catching the exception.

In `@sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h`:
- Around line 10-12: Remove the redundant self-include of
"online-transducer-greedy-search-decoder-axera.h" from the top of the header
that currently includes itself; keep the other headers (e.g.,
online-transducer-decoder-axera.h and
online-zipformer-transducer-model-axera.h). After removing the self-include,
rebuild and run the header's consumers to ensure no direct dependency was
masked—if compilation fails, add the missing specific header(s) that the failing
translation unit needs rather than re-adding the self-include.

In `@sherpa-onnx/csrc/vec-to-string.h`:
- Around line 10-22: The header is not self-contained because VecToString uses
int32_t in its signature without including <cstdint>; add an `#include` <cstdint>
(so int32_t is defined) at the top of the header alongside the existing
includes, or alternatively change the parameter to a standard type (e.g., int)
and update VecToString accordingly; ensure the function signature for
VecToString and any other references to int32_t compile without relying on
transitive includes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ad223c9-557a-4911-ac47-31e6447c4412

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2fc65 and 0d2a7fc.

📒 Files selected for processing (18)
  • sherpa-onnx/csrc/CMakeLists.txt
  • sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h
  • sherpa-onnx/csrc/axera/online-stream-axera.cc
  • sherpa-onnx/csrc/axera/online-stream-axera.h
  • sherpa-onnx/csrc/axera/online-transducer-decoder-axera.h
  • sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.cc
  • sherpa-onnx/csrc/axera/online-transducer-greedy-search-decoder-axera.h
  • sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc
  • sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.h
  • sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.cc
  • sherpa-onnx/csrc/axera/online-zipformer-transducer-model-axera.h
  • sherpa-onnx/csrc/online-model-config.cc
  • sherpa-onnx/csrc/online-model-config.h
  • sherpa-onnx/csrc/online-zipformer-meta-config.cc
  • sherpa-onnx/csrc/online-zipformer-meta-config.h
  • sherpa-onnx/csrc/parse-options.cc
  • sherpa-onnx/csrc/parse-options.h
  • sherpa-onnx/csrc/vec-to-string.h

Comment on lines +220 to +230
auto r = decoder_->GetEmptyResult();
auto last_result =
reinterpret_cast<OnlineStreamAxera *>(s)->GetZipformerResult();

// if last result is not empty, then
// preserve last tokens as the context for next result
if (static_cast<int32_t>(last_result.tokens.size()) > context_size) {
r.tokens = {last_result.tokens.end() - context_size,
last_result.tokens.end()};
}
reinterpret_cast<OnlineStreamAxera *>(s)->SetZipformerResult(std::move(r));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve beam-search context in hyps, not just tokens.

Lines 226-229 only copy the trailing context into r.tokens, but sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoderAxera::Decode consumes OnlineTransducerDecoderResultAxera::hyps and OnlineTransducerDecoderResultAxera::previous_decoder_out2 instead of tokens in sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc, Lines 106-112. After Reset(), the modified beam-search path restarts from the blank hypothesis instead of the preserved context. Please rebuild the carried hypothesis for that decoding mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h` around
lines 220 - 230, The current code only copies trailing tokens into r.tokens, but
modified beam-search (OnlineTransducerModifiedBeamSearchDecoderAxera::Decode)
uses OnlineTransducerDecoderResultAxera::hyps and previous_decoder_out2, so
rebuild and preserve the carried hypothesis state instead of just tokens: fetch
last_result via OnlineStreamAxera::GetZipformerResult(), trim its hyps to the
trailing context_size (preserving each hypothesis' tokens and scores), also
preserve previous_decoder_out2 (or trim/clone it consistently with the carried
hyps), assign these into r.hyps and r.previous_decoder_out2 (not just r.tokens),
then call OnlineStreamAxera::SetZipformerResult(std::move(r)) so Reset() and the
subsequent Decode resume from the preserved beam-search state.

Comment on lines +14 to +33
class OnlineStreamAxera : public OnlineStream {
public:
explicit OnlineStreamAxera(const FeatureExtractorConfig &config = {},
ContextGraphPtr context_graph = nullptr);

~OnlineStreamAxera();

void SetZipformerEncoderStates(
std::vector<std::vector<uint8_t>> states) const;

std::vector<std::vector<uint8_t>> &GetZipformerEncoderStates() const;

void SetZipformerResult(OnlineTransducerDecoderResultAxera r) const;

OnlineTransducerDecoderResultAxera &GetZipformerResult() const;

private:
class Impl;
std::unique_ptr<Impl> impl_;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reset() does not cover the new Axera stream state.

OnlineStream already exposes a concrete non-virtual Reset(), but this subclass adds separate encoder/result state that is outside that path. Reusing a stream after reset will carry stale encoder caches, tokens, timestamps, frame_offset, and cached decoder outputs into the next utterance. Because streams are returned as std::unique_ptr<OnlineStream>, a derived-only helper would not fix normal callers. Please make sure the public reset path reinitializes these Axera-specific members too.

Comment on lines +155 to +157
result->hyps = std::move(cur);
result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Synchronize the winning hypothesis back into result.

At the end of Decode(), only hyps, frame_offset, and previous_decoder_out2 are updated. sherpa_onnx::OnlineRecognizerTransducerAxeraImpl::IsEndpoint reads GetZipformerResult().num_trailing_blanks in sherpa-onnx/csrc/axera/online-recognizer-transducer-axera-impl.h, Lines 195-200, and Reset() slices GetZipformerResult().tokens in Lines 221-229, so modified beam search currently leaves both values stale. That breaks endpoint detection and gives Reset() the wrong context window.

Suggested fix
-  result->hyps = std::move(cur);
+  auto best = cur.GetMostProbable(true);
+  result->hyps = std::move(cur);
+  result->tokens = std::move(best.ys);  // keep full context; GetResult() strips it later
+  result->timestamps = std::move(best.timestamps);
+  result->num_trailing_blanks = best.num_trailing_blanks;
   result->frame_offset += num_frames;
   result->previous_decoder_out2 = std::move(decoder_out);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result->hyps = std::move(cur);
result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);
auto best = cur.GetMostProbable(true);
result->hyps = std::move(cur);
result->tokens = std::move(best.ys); // keep full context; GetResult() strips it later
result->timestamps = std::move(best.timestamps);
result->num_trailing_blanks = best.num_trailing_blanks;
result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sherpa-onnx/csrc/axera/online-transducer-modified-beam-search-decoder-axera.cc`
around lines 155 - 157, Decode() currently updates result->hyps,
result->frame_offset, and result->previous_decoder_out2 but leaves the
zipformer/context info stale; copy the winning hypothesis's zipformer result (at
least num_trailing_blanks and tokens) from the chosen entry inside cur into
result's Zipformer result so GetZipformerResult() and Reset() see the updated
values. Locate where result->hyps is set (the block with result->hyps =
std::move(cur); result->frame_offset += num_frames;
result->previous_decoder_out2 = std::move(decoder_out);) and after that assign
the winning hypothesis' zipformer result fields (the tokens vector and
num_trailing_blanks) into result (so GetZipformerResult().tokens and
.num_trailing_blanks reflect the selected hypothesis used by IsEndpoint and
Reset()).

OnlineZipformer2CtcModelConfig zipformer2_ctc;
OnlineNeMoCtcModelConfig nemo_ctc;
OnlineToneCtcModelConfig t_one_ctc;
OnlineZipformerMetaConfig zipformer_meta;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expose zipformer_meta in the language bindings too.

This new field becomes required for Axera transducer validation in this PR, but the provided Python binding snippet still omits it from the constructor and def_readwrite, and the JNI GetOnlineModelConfig() snippet never populates it. Axera configs created from Python/Java won't be able to pass validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-model-config.h` at line 27, The new
OnlineZipformerMetaConfig field zipformer_meta was added to OnlineModelConfig
but not exposed/initialized in bindings; update the Python binding to accept a
zipformer_meta parameter in the constructor and add
def_readwrite("zipformer_meta", &OnlineModelConfig::zipformer_meta) so
Python-created configs include it, and update the JNI GetOnlineModelConfig()
implementation to populate the OnlineModelConfig::zipformer_meta
(construct/convert into OnlineZipformerMetaConfig) when building the native
config so Java-created configs also pass Axera transducer validation.

Comment on lines +31 to +47
bool OnlineZipformerMetaConfig::Validate() const {
if (encoder_dims.empty() || attention_dims.empty() ||
num_encoder_layers.empty() || cnn_module_kernels.empty() ||
left_context_len.empty()) {
return false;
}
if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) {
return false;
}

size_t n = encoder_dims.size();
if (attention_dims.size() != n || num_encoder_layers.size() != n ||
cnn_module_kernels.size() != n || left_context_len.size() != n) {
return false;
}

return true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject invalid per-stage values here, not only mismatched lengths.

This accepts vectors like encoder_dims=384,-1 or num_encoder_layers=2,0 as long as the lists are non-empty and have matching sizes. That pushes a bad config into the Axera loader instead of failing fast in Validate().

Possible fix
 bool OnlineZipformerMetaConfig::Validate() const {
   if (encoder_dims.empty() || attention_dims.empty() ||
       num_encoder_layers.empty() || cnn_module_kernels.empty() ||
       left_context_len.empty()) {
     return false;
   }
   if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) {
     return false;
   }
+
+  auto all_positive = [](const std::vector<int32_t> &v) {
+    for (auto x : v) {
+      if (x <= 0) return false;
+    }
+    return true;
+  };
+
+  auto all_non_negative = [](const std::vector<int32_t> &v) {
+    for (auto x : v) {
+      if (x < 0) return false;
+    }
+    return true;
+  };
 
   size_t n = encoder_dims.size();
   if (attention_dims.size() != n || num_encoder_layers.size() != n ||
       cnn_module_kernels.size() != n || left_context_len.size() != n) {
     return false;
   }
+
+  if (!all_positive(encoder_dims) || !all_positive(attention_dims) ||
+      !all_positive(num_encoder_layers) || !all_positive(cnn_module_kernels) ||
+      !all_non_negative(left_context_len)) {
+    return false;
+  }
 
   return true;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bool OnlineZipformerMetaConfig::Validate() const {
if (encoder_dims.empty() || attention_dims.empty() ||
num_encoder_layers.empty() || cnn_module_kernels.empty() ||
left_context_len.empty()) {
return false;
}
if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) {
return false;
}
size_t n = encoder_dims.size();
if (attention_dims.size() != n || num_encoder_layers.size() != n ||
cnn_module_kernels.size() != n || left_context_len.size() != n) {
return false;
}
return true;
bool OnlineZipformerMetaConfig::Validate() const {
if (encoder_dims.empty() || attention_dims.empty() ||
num_encoder_layers.empty() || cnn_module_kernels.empty() ||
left_context_len.empty()) {
return false;
}
if (T <= 0 || decode_chunk_len <= 0 || context_size <= 0) {
return false;
}
auto all_positive = [](const std::vector<int32_t> &v) {
for (auto x : v) {
if (x <= 0) return false;
}
return true;
};
auto all_non_negative = [](const std::vector<int32_t> &v) {
for (auto x : v) {
if (x < 0) return false;
}
return true;
};
size_t n = encoder_dims.size();
if (attention_dims.size() != n || num_encoder_layers.size() != n ||
cnn_module_kernels.size() != n || left_context_len.size() != n) {
return false;
}
if (!all_positive(encoder_dims) || !all_positive(attention_dims) ||
!all_positive(num_encoder_layers) || !all_positive(cnn_module_kernels) ||
!all_non_negative(left_context_len)) {
return false;
}
return true;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-zipformer-meta-config.cc` around lines 31 - 47, In
OnlineZipformerMetaConfig::Validate(), in addition to the existing emptiness and
size-equality checks for encoder_dims, attention_dims, num_encoder_layers,
cnn_module_kernels, and left_context_len, iterate over each stage index i and
reject configurations where encoder_dims[i] <= 0, attention_dims[i] <= 0,
num_encoder_layers[i] <= 0, or cnn_module_kernels[i] <= 0 (and ensure
left_context_len[i] is not negative if you allow zero), so invalid per-stage
values like 384,-1 or 2,0 are caught early; keep the existing scalar checks for
T, decode_chunk_len, and context_size as-is.

Comment on lines +579 to +585
} else if (int_vec_map_.end() != int_vec_map_.find(key)) {
std::vector<int32_t> v;
if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) {
SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str());
exit(-1);
}
*(int_vec_map_[key]) = std::move(v);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject malformed int-vector syntax instead of silently normalizing it.

This branch accepts bare --foo as an empty vector and collapses 1,,2 / 1,2, because has_equal_sign is ignored and omit_empty_strings=true. For topology options like the Zipformer stage lists, that turns typos into different configs instead of surfacing an error.

Suggested fix
   } else if (int_vec_map_.end() != int_vec_map_.find(key)) {
+    if (!has_equal_sign) {
+      SHERPA_ONNX_LOGE("Invalid option --%s (option format is --x=y).",
+                       key.c_str());
+      exit(-1);
+    }
     std::vector<int32_t> v;
-    if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) {
+    if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/false, &v)) {
       SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str());
       exit(-1);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (int_vec_map_.end() != int_vec_map_.find(key)) {
std::vector<int32_t> v;
if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/true, &v)) {
SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str());
exit(-1);
}
*(int_vec_map_[key]) = std::move(v);
} else if (int_vec_map_.end() != int_vec_map_.find(key)) {
if (!has_equal_sign) {
SHERPA_ONNX_LOGE("Invalid option --%s (option format is --x=y).",
key.c_str());
exit(-1);
}
std::vector<int32_t> v;
if (!SplitStringToIntegers(value, ",", /*omit_empty_strings=*/false, &v)) {
SHERPA_ONNX_LOGE("Invalid int vector option \"%s\"", value.c_str());
exit(-1);
}
*(int_vec_map_[key]) = std::move(v);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/parse-options.cc` around lines 579 - 585, The int-vector
branch currently ignores whether the original arg had an '=' and drops empty
tokens, so inputs like bare "--foo" or "1,,2" are silently normalized; change
the logic in the branch handling int_vec_map_ (the code that calls
SplitStringToIntegers) to first require that has_equal_sign is true and reject
the option if not, and then split while preserving empty tokens (call
SplitStringToIntegers with omit_empty_strings=false or otherwise detect empty
tokens) and treat any empty token or a failed parse as an error: log via
SHERPA_ONNX_LOGE (including the offending value) and exit(-1). Ensure you set
*(int_vec_map_[key]) only after validation succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant