Skip to content

Harden inference error handling, reset semantics, and warmup safety#2

Merged
axeldelafosse merged 3 commits into
mainfrom
fix_inference_reset_warmup
Feb 7, 2026
Merged

Harden inference error handling, reset semantics, and warmup safety#2
axeldelafosse merged 3 commits into
mainfrom
fix_inference_reset_warmup

Conversation

@axeldelafosse
Copy link
Copy Markdown
Member

Summary

  • return failure from ONNX inference when output tensor extraction fails (instead of returning success with invalid output)
  • harden inference queue epoch/reset behavior to avoid stale-slot blockage after transport resets
  • make inference thread drop failed requests instead of publishing them as processed
  • add bounded warmup wait in prepareToPlay to prevent indefinite blocking

Validation

  • cmake --build build
  • ctest --test-dir build --output-on-failure
  • result: 65/65 tests passed

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @axeldelafosse, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness and reliability of the ONNX inference system by addressing several critical areas. It ensures that inference errors are properly handled and reported, prevents the inference queue from getting stuck due to stale requests after a reset, and adds a safety mechanism to the warmup process to avoid indefinite blocking. These changes collectively enhance the stability and predictability of the plugin's real-time audio processing capabilities.

Highlights

  • Hardened ONNX Inference Error Handling: The runInference method now explicitly returns false and logs an error if output tensor data cannot be extracted, preventing the system from proceeding with invalid output.
  • Improved Inference Queue Reset Semantics: The inference queue's reset behavior has been enhanced to prevent stale-slot blockage. This includes tracking an epochStartIdx_ to allow the inference thread to jump past old, non-ready requests after a reset, and clearing stale flags for previous epoch slots.
  • Inference Thread Drops Failed Requests: The inference thread now explicitly drops requests that fail during inference or become stale due to a reset, rather than publishing them as processed. This ensures only valid results are propagated.
  • Bounded Warmup Wait in prepareToPlay: The prepareToPlay function now includes a bounded wait for the ONNX runtime warmup process, with a 2-second timeout. This prevents indefinite blocking if the inference thread stalls during warmup.
Changelog
  • plugin/include/StemgenRT/InferenceQueue.h
    • Added #include <cstdint> for integer types.
    • Introduced epochStartIdx_ atomic variable to track the starting write index for new epochs, aiding in reset synchronization.
  • plugin/source/InferenceQueue.cpp
    • Modified getWriteSlot() to reclaim stale processed slots from previous epochs, preventing them from blocking new requests.
    • Updated getOutputSlot() to skip stale results from previous epochs and continue scanning for valid outputs.
    • Enhanced reset() to capture the current writeIdx_ as epochStartIdx_ and immediately advance consumeIdx_ to this point, effectively discarding old-epoch outputs.
    • Modified fullReset() to also reset epochStartIdx_ to 0.
    • In inferenceThreadFunc(), the readIdx_ now jumps to epochStartIdx_ on an epoch change, avoiding scanning old holes.
    • Added logic in inferenceThreadFunc() to clear ready and processed flags for slots belonging to previous epochs, ensuring proper reuse.
    • Implemented error handling in inferenceThreadFunc(): if runInference fails, the request is marked as not ready/processed and the readIdx_ is advanced, effectively dropping the failed request.
    • Ensured that if a reset occurs during inference, the processed flag is also cleared for the request.
  • plugin/source/OnnxRuntime.cpp
    • Modified runInference() to return false if GetTensorMutableData fails or returns a null pointer, indicating an inference output extraction error.
    • Added debug logging for cases where output tensor data cannot be retrieved.
  • plugin/source/PluginProcessor.cpp
    • Added #include <chrono> for time-related functionalities.
    • Updated comments in prepareToPlay to clarify the warmup logic.
    • Implemented a 2-second timeout for the ONNX runtime warmup loop in prepareToPlay to prevent indefinite blocking.
    • Added debug logging to indicate whether the warmup completed successfully or timed out.
    • Ensured inferenceQueue_.reset() is always called after the warmup attempt to invalidate any late warmup results and re-synchronize queue indices.
Activity
  • The author has implemented changes to harden error handling, reset semantics, and warmup safety.
  • The pull request includes validation steps, confirming that cmake --build build and ctest --test-dir build --output-on-failure resulted in 65/65 tests passed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly hardens the inference queue's error handling, reset semantics, and warmup safety, incorporating improvements such as returning failure from ONNX inference on tensor extraction errors, a more robust inference queue reset mechanism, and a timeout for warmup waits. However, two medium-severity issues were identified: a potential division-by-zero when handling silent input, which could inject NaN/Inf into the audio stream, and a logic error in the epoch reset handling that might cause temporary stalls in the real-time inference thread after a transport reset. Addressing these will further enhance the plugin's stability and safety.

Comment thread plugin/source/OnnxRuntime.cpp
Comment thread plugin/source/InferenceQueue.cpp Outdated
Comment thread plugin/source/InferenceQueue.cpp Outdated
@axeldelafosse axeldelafosse merged commit 9d8d641 into main Feb 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant