Skip to content

add android tts engine for zipvoice tts#2806

Open
rhjdvsgsgks wants to merge 9 commits intok2-fsa:masterfrom
rhjdvsgsgks:zipvoice
Open

add android tts engine for zipvoice tts#2806
rhjdvsgsgks wants to merge 9 commits intok2-fsa:masterfrom
rhjdvsgsgks:zipvoice

Conversation

@rhjdvsgsgks
Copy link

@rhjdvsgsgks rhjdvsgsgks commented Nov 19, 2025

also a java & kotlin api (and corresponding demo/test) for zipvoice

to compatible with zero shot tts. i added a new generateWithPrompt api

fix callback been ignored in zipvoice

todo:

Summary by CodeRabbit

  • New Features

    • Zipvoice prompt-based TTS: prompt-assisted generation, new Zipvoice model config, Java/Kotlin APIs, WAV prompt handling, WaveReader utility, and a bilingual example program.
  • Build & Testing

    • Android builds simplified to single-arch debug APKs; APK artifacts uploaded unconditionally. CI matrix reduced and signing/publishing steps removed. New run script for non-streaming Zipvoice example added.
  • Bug Fixes

    • English phoneme generation fixed to use en-us voice.
  • Other

    • Ignore list updated.

✏️ Tip: You can customize this high-level summary in your review settings.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 19, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Zipvoice (prompt-based TTS) across native, JNI, Java/Kotlin APIs, Android apps, build scripts, examples, and CI; introduces prompt-aware generation methods and configs, a WaveReader utility, ZIPVoice wiring in Android/TtsEngine/TtsService, JNI/native prompt methods, example runner, and simplifies APK CI matrix/signing/publishing.

Changes

Cohort / File(s) Change Summary
CI workflows
\.github/workflows/apk-tts-engine.yaml, \.github/workflows/apk-tts.yaml, \.github/workflows/run-java-test.yaml
Reduced job matrices, removed repository-owner guard, added checkout fetch-depth, added Java/ccache/NDK/Python/setup steps, removed APK signing & HuggingFace publish, made artifact upload unconditional, and added Zipvoice test invocation + cleanup.
Android app — UI & engine
App: android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt, android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
Engine: android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt, .../TtsService.kt, .../MainActivity.kt, .../WaveReader.kt
Add Zipvoice fields (isZipvoice, textModel, flowMatchingModel, pinyinDict), include WaveReader, pass Zipvoice config into OfflineTts, and branch runtime generation to prompt-based generateWithPromptWithCallback when isZipvoice is true.
Kotlin API
sherpa-onnx/kotlin-api/Tts.kt
New data class OfflineTtsZipvoiceModelConfig, extend OfflineTtsModelConfig with zipvoice, extend getOfflineTtsConfig signature with isZipvoice + textModel/flowMatchingModel/pinyinDict, create zipvoice config when requested, and add generateWithPrompt / generateWithPromptWithCallback native bindings.
Java API & examples
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java, .../OfflineTtsModelConfig.java, .../OfflineTtsZipvoiceModelConfig.java, sherpa-onnx/java-api/Makefile, java-api-examples/NonStreamingTtsZipvoiceZhEn.java, java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
Add prompt-based overloads and native declarations in OfflineTts, add OfflineTtsZipvoiceModelConfig and wire into OfflineTtsModelConfig and Makefile, and add a Zipvoice Java example plus runner script that loads a WAV prompt.
JNI & native bridge
sherpa-onnx/jni/offline-tts.cc
Parse zipvoice config in GetOfflineTtsConfig and add JNI natives generateWithPromptImpl and generateWithPromptWithCallbackImpl (prompt_samples marshalling and callback wrapper).
C++ backend & frontend
sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc, sherpa-onnx/csrc/offline-tts-zipvoice-impl.h
Frontend: force English tokenization voice to "en-us" for English text. Impl: capture Process(...) result into a local GeneratedAudio, optionally invoke callback with samples before returning.
APK build tooling & scripts
scripts/apk/build-apk-tts.sh.in, scripts/apk/build-apk-tts-engine.sh.in, scripts/apk/generate-tts-apk-script.py
Limit builds to arm64-v8a, add Zipvoice template fields (text_model, flow_matching_model, pinyin_dict, is_zipvoice, cmd), conditional ONNX cleanup and vocoder fetch logic, wire Zipvoice fields into Android sources, change to assembleDebug, and add get_zipvoice_models() plus new TtsModel fields.
Other repo files
\.gitignore
Add sherpa-onnx-zipvoice-distill-zh-en-emilia to ignore list.

Sequence Diagram(s)

sequenceDiagram
    participant App as Android App
    participant Service as TtsService
    participant Wave as WaveReader
    participant JNI as JNI Bridge
    participant Core as OfflineTts (native)

    alt isZipvoice = true
        App->>Service: synthesize(text)
        Service->>Wave: load prompt.wav
        Wave-->>Service: samples, sampleRate
        Service->>JNI: generateWithPromptWithCallback(text, prompt_text, samples, sampleRate, callback)
        JNI->>Core: generateWithPrompt(...)
        Core-->>JNI: audio chunks
        JNI->>Service: callback(chunk)
        Service-->>App: play audio
    else
        App->>Service: synthesize(text)
        Service->>JNI: generateWithCallback(text, speakerId, speed, callback)
        JNI->>Core: generate(...)
        Core-->>JNI: audio chunks
        JNI->>Service: callback(chunk)
        Service-->>App: play audio
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50–70 minutes

Areas needing extra attention:

  • sherpa-onnx/jni/offline-tts.cc — JNI lifetimes, prompt_samples marshalling, and callback wrapping.
  • Cross-language API consistency (Kotlin ↔ Java ↔ JNI ↔ C++) — method signatures and data shapes.
  • Build/script templating and APK packaging changes in scripts/apk/*.
  • C++ frontend change forcing English tokenization to "en-us".

Possibly related PRs

Suggested reviewers

  • csukuangfj

Poem

🐰 I nibble WAV prompts in twilight code,
I hop through JNI and zipvoice mode.
From Kotlin to C++, I carry the song,
Small CI, big voice — we hum all day long.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.70% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main addition: Android TTS engine support for ZipVoice TTS, which aligns with the primary changes across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85f45f8 and e8ffd0f.

📒 Files selected for processing (13)
  • .github/workflows/apk-tts-engine.yaml (1 hunks)
  • .github/workflows/apk-tts.yaml (1 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (6 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (3 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (5 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (2 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1 hunks)
  • scripts/apk/build-apk-tts-engine.sh.in (7 hunks)
  • scripts/apk/build-apk-tts.sh.in (6 hunks)
  • scripts/apk/generate-tts-apk-script.py (4 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

  • scripts/apk/build-apk-tts-engine.sh.in
🧬 Code graph analysis (1)
sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)
sherpa-onnx/csrc/offline-tts-matcha-impl.h (1)
  • Process (425-430)
🔇 Additional comments (10)
sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)

89-95: Callback wiring from ZipVoice Generate to caller now looks correct

Capturing the Process() result into ans, invoking callback(ans.samples.data(), ans.samples.size(), 1.0), then returning ans preserves semantics and finally honors the callback for ZipVoice generation. No issues here.

scripts/apk/generate-tts-apk-script.py (1)

35-53: ZipVoice model metadata and helper are consistent with templating

Extending TtsModel with vocoder, text_model, flow_matching_model, pinyin_dict, is_zipvoice, and cmd, plus get_zipvoice_models() that sets data_dir and is_zipvoice, lines up cleanly with how the shell templates consume these fields (text_model, flow_matching_model, pinyin_dict, is_zipvoice, cmd). Using two entries with the same model_dir to represent fp32 vs int8 variants is reasonable given the per‑model cleanup command.

Also applies to: 548-591

scripts/apk/build-apk-tts-engine.sh.in (1)

36-58: ZipVoice fields are wired correctly into assets and TtsEngine

The engine build script now:

  • Exposes text_model, flow_matching_model, and pinyin_dict in the templated variables.
  • Skips downloading a separate vocoder when tts_model.is_zipvoice is true.
  • Injects isZipvoice, textModel, flowMatchingModel, and pinyinDict into TtsEngine.kt only when configured.

This matches the new ZipVoice model metadata and Kotlin/Java config changes and should let the engine distinguish ZipVoice models cleanly.

Also applies to: 76-78, 96-107, 141-155

.github/workflows/apk-tts-engine.yaml (1)

24-27: CI workflow matches new shard count and build scripts

Reducing the matrix to total=["2"], index=["0","1"] and wiring in the new “generate build script → build-apk-tts-engine.sh” flow is consistent with the updated generate-tts-apk-script.py and engine build script. The added setup steps (Java 21, ccache, NDK introspection, Python deps) should make the build more deterministic.

Also applies to: 33-77, 78-110

.github/workflows/apk-tts.yaml (1)

24-27: TTS APK workflow simplification is consistent with engine workflow

The apk-tts workflow now mirrors the engine workflow: 2-way sharding, explicit version update, Java 21 + ccache setup, NDK inspection, Python deps, scripted generation of build-apk-tts.sh, and a single build/upload path. This matches the updated scripts and should be easier to maintain than the previous 40-way matrix with signing/publishing.

Also applies to: 33-77, 78-110

scripts/apk/build-apk-tts.sh.in (1)

36-57: App build script correctly integrates ZipVoice metadata

For the demo app:

  • text_model, flow_matching_model, and pinyin_dict are exposed and injected into MainActivity.kt only when present.
  • is_zipvoice controls both vocoder download (skipped for ZipVoice) and isZipvoice toggling in MainActivity.kt.
  • Arch loop is simplified to arm64‑v8a, aligned with the native build phase.

This mirrors the engine script and matches the new Kotlin configuration surface, so the wiring looks sound.

Also applies to: 82-93, 120-123, 135-155

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (4)

192-195: LGTM: Variable declarations follow existing patterns.

The new Zipvoice-related variables (textModel, flowMatchingModel, pinyinDict, isZipvoice) follow the same declaration pattern as the existing model configuration variables. Note that isZipvoice should become a class member property (addressed in the comment on line 143).

Also applies to: 204-204


223-228: LGTM: Initialization block is consistent.

The Zipvoice initialization block follows the same pattern as the existing model types (VITS, Matcha, Kokoro), making the code structure consistent and easy to understand.


314-332: LGTM: Zipvoice examples are clear and well-structured.

Examples 12 and 13 provide clear guidance on configuring Zipvoice models with both full precision and int8 quantized variants. The examples correctly set isZipvoice = true and follow the established documentation pattern.


339-355: LGTM: Config construction properly integrates Zipvoice parameters.

The new Zipvoice-related parameters (textModel, flowMatchingModel, pinyinDict, isZipvoice) are correctly passed to getOfflineTtsConfig following the same pattern as existing optional parameters.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @rhjdvsgsgks, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's Text-to-Speech capabilities by integrating the Zipvoice TTS engine, specifically for Android. It provides comprehensive Java and Kotlin APIs for interacting with Zipvoice, including a novel generateWithPrompt function that facilitates zero-shot TTS by allowing users to provide an audio prompt. Additionally, a critical fix has been applied to ensure that callbacks are correctly processed within the Zipvoice framework, improving the reliability and responsiveness of the TTS service.

Highlights

  • Zipvoice TTS Engine Integration: The pull request introduces the Zipvoice Text-to-Speech (TTS) engine for Android, expanding the available TTS capabilities within the application.
  • Java & Kotlin API for Zipvoice: New Java and Kotlin APIs have been added to support the Zipvoice TTS engine, allowing developers to easily integrate and utilize its features.
  • Zero-Shot TTS with generateWithPrompt: A new generateWithPrompt API has been implemented, enabling zero-shot TTS functionality where a prompt audio can influence the generated speech.
  • Callback Fix for Zipvoice: A bug where callbacks were previously ignored in the Zipvoice implementation has been fixed, ensuring proper handling of audio generation progress and completion.
Ignored Files
  • Ignored by pattern: .github/workflows/** (3)
    • .github/workflows/apk-tts-engine.yaml
    • .github/workflows/apk-tts.yaml
    • .github/workflows/run-java-test.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the zipvoice TTS model to the Android and Java/Kotlin APIs, including a new generateWithPrompt API for zero-shot TTS. The changes are extensive, covering Android demo apps, the TTS engine, Java/Kotlin APIs, JNI bindings, and build scripts. Overall, the implementation is good, but there are a few areas for improvement. I've identified a critical bug in the Android demo app where a feature flag is hardcoded, making the new functionality unreachable. Additionally, there's significant code duplication with hardcoded values for testing zipvoice across multiple files, which should be refactored. I've also left some comments on minor issues regarding naming conventions, file paths, and misleading comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (1)

150-160: Hardcoded prompt text and audio path should be configurable.

The same hardcoded prompt text and audio file path from MainActivity.kt are duplicated here. This violates DRY principles and has the same indentation issues.

Apply this diff to fix indentation:

-        if (TtsEngine.isZipvoice) {
-		    val prompt_text = "周日被我射熄火了,所以今天是周一。"
-
-    val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav")
-
-		    tts.generateWithPromptWithCallback(text, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ttsCallback)} else tts.generateWithCallback(
+        if (TtsEngine.isZipvoice) {
+            val prompt_text = "周日被我射熄火了,所以今天是周一。"
+            val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav")
+            tts.generateWithPromptWithCallback(text, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ttsCallback)
+        } else tts.generateWithCallback(

Consider centralizing these Zipvoice configuration values (prompt text and audio path) in TtsEngine.kt to avoid duplication across MainActivity.kt and TtsService.kt. This aligns with the PR TODO: "Allow customizing prompt audio/text in Android TTS."

♻️ Duplicate comments (1)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1)

1-1: Same symlinked WaveReader.kt pattern as in TTS engine module

This file is also a symlink to sherpa-onnx/kotlin-api/WaveReader.kt, with the same portability and tooling concerns mentioned for android/SherpaOnnxTtsEngine/.../WaveReader.kt. If you decide to switch to a shared Kotlin module or a copied source there, it would be good to apply the same approach here for consistency.

🧹 Nitpick comments (6)
scripts/apk/generate-tts-apk-script.py (1)

547-572: Clarify duplicated model_dir entries in get_zipvoice_models()

Both ZipVoice entries use the same model_dir but differ only in text_model/flow_matching_model (float vs int8). If any downstream logic or templates key by model_dir alone (e.g., directory creation, download steps, display labels), this duplication could cause confusing behavior or redundant work.

If you intend to ship both precision variants side by side, consider either:

  • Making the distinction explicit in metadata (e.g., a variant field), or
  • Verifying templates only rely on the per-field filenames and do not assume model_dir uniqueness.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1)

1-1: Symlinked WaveReader.kt may be fragile across platforms and tooling

This file appears to be a Git symlink pointing back to sherpa-onnx/kotlin-api/WaveReader.kt. That works on POSIX systems, but can be brittle on:

  • Windows dev environments where symlink support is limited or disabled.
  • Tools/editors/IDEs that don’t resolve symlinked Kotlin sources well.
  • Future directory restructurings, since the long ../ chain is easy to break.

If you expect contributors/builds on mixed platforms, consider instead:

  • Moving WaveReader into a small shared Kotlin module and adding a Gradle dependency from this engine module, or
  • Copying the file here and keeping it synchronized via a simple script.
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

143-150: Hardcoded prompt text and audio path.

The prompt text and audio file path are hardcoded in the generation logic. While this works for demonstration purposes, consider:

  1. Making the prompt text and audio path configurable via UI or settings
  2. Adding error handling for missing prompt audio files

This aligns with the TODO mentioned in the PR description: "Allow customizing prompt audio/text in Android TTS."

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)

61-63: Inconsistent getter naming: gettShift.

The getter method is named gettShift() instead of following the standard JavaBean convention getTShift(). While this works functionally, it's inconsistent with naming conventions and the corresponding setter settShift().

Consider renaming to getTShift() for consistency, unless there's a specific reason for the lowercase 't'.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)

33-46: Consider adding parameter validation.

The new generateWithPrompt and generateWithPromptWithCallback methods don't validate parameters like prompt_text, prompt_samples, or sample_rate. Consider adding validation to:

  1. Check for null/empty prompt_text
  2. Check for null/empty prompt_samples array
  3. Validate sample_rate is positive

This would provide clearer error messages compared to potential JNI crashes.

Example validation:

 public GeneratedAudio generateWithPrompt(String text, String prompt_text, float[] prompt_samples, int sample_rate, float speed, int num_step) {
+    if (prompt_text == null || prompt_text.isEmpty()) {
+        throw new IllegalArgumentException("prompt_text cannot be null or empty");
+    }
+    if (prompt_samples == null || prompt_samples.length == 0) {
+        throw new IllegalArgumentException("prompt_samples cannot be null or empty");
+    }
+    if (sample_rate <= 0) {
+        throw new IllegalArgumentException("sample_rate must be positive");
+    }
     Object[] arr = generateWithPromptImpl(ptr, text, prompt_text, prompt_samples, sample_rate, speed, num_step);

Also applies to: 63-76

java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)

6-11: Consider adding a comment about model download location.

For consistency with other examples (e.g., NonStreamingTtsPiperEn.java lines 7-9), consider adding a comment indicating where users can download the Zipvoice models.

For example:

+    // please visit
+    // https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
+    // to download model files
     String textModel = "sherpa-onnx-zipvoice-distill-zh-en-emilia/text_encoder.onnx";
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fcde7d and 9f24b71.

📒 Files selected for processing (23)
  • .github/workflows/apk-tts-engine.yaml (1 hunks)
  • .github/workflows/apk-tts.yaml (1 hunks)
  • .github/workflows/run-java-test.yaml (1 hunks)
  • .gitignore (1 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (6 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (3 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (5 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (2 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1 hunks)
  • java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1 hunks)
  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (1 hunks)
  • scripts/apk/build-apk-tts-engine.sh.in (7 hunks)
  • scripts/apk/build-apk-tts.sh.in (6 hunks)
  • scripts/apk/generate-tts-apk-script.py (4 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1 hunks)
  • sherpa-onnx/java-api/Makefile (1 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (3 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (5 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1 hunks)
  • sherpa-onnx/jni/offline-tts.cc (3 hunks)
  • sherpa-onnx/kotlin-api/Tts.kt (8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

  • .gitignore
  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
  • sherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
  • scripts/apk/build-apk-tts-engine.sh.in
  • sherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (8)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
  • Builder (73-139)
  • OfflineTtsZipvoiceModelConfig (3-140)
sherpa-onnx/kotlin-api/Tts.kt (1)
  • vits (60-69)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
  • LibraryUtils (43-246)
java-api-examples/NonStreamingTtsPiperEn.java (2)
  • NonStreamingTtsPiperEn (7-58)
  • main (8-57)
sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)
sherpa-onnx/csrc/offline-tts-matcha-impl.h (1)
  • Process (425-430)
sherpa-onnx/jni/offline-tts.cc (4)
sherpa-onnx/kotlin-api/Tts.kt (2)
  • tokens (47-58)
  • samples (79-91)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
  • tokens (47-58)
  • samples (79-91)
sherpa-onnx/jni/jni.cc (2)
  • NewInteger (17-21)
  • NewInteger (17-17)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
  • callback (98-106)
  • tts (22-399)
sherpa-onnx/kotlin-api/Tts.kt (4)
scripts/go/sherpa_onnx.go (7)
  • OfflineTtsVitsModelConfig (921-930)
  • OfflineTtsMatchaModelConfig (932-941)
  • OfflineTtsKokoroModelConfig (943-952)
  • OfflineTtsKittenModelConfig (954-960)
  • OfflineTtsZipvoiceModelConfig (962-974)
  • GeneratedAudio (1001-1006)
  • OfflineTtsModelConfig (976-991)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
  • OfflineTtsZipvoiceModelConfig (3-140)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
  • generateWithPromptImpl (223-231)
  • generateWithPromptWithCallbackImpl (241-250)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
  • OfflineTtsModelConfig (5-104)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
  • TtsModel (28-36)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
  • generateWithPrompt (123-136)
  • generateWithPromptImpl (223-231)
  • samples (79-91)
  • sampleRate (107-107)
  • generateWithPromptWithCallback (157-171)
  • generateWithPromptWithCallbackImpl (241-250)
  • generateWithCallbackImpl (233-239)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
sherpa-onnx/kotlin-api/Tts.kt (1)
  • tokens (47-58)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
  • Builder (50-103)
🔇 Additional comments (33)
.gitignore (1)

161-161: Good—follows repository conventions for model exclusions.

Adding the ZipVoice model artifact to .gitignore is appropriate and aligns with how other model directories are already excluded in the repository.

sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1)

345-345: Now I need to verify what TokenizeEn does to confirm if hard-coding "en-us" is actually the right approach or a genuine bug.

Verify review comment: Hard-coded "en-us" affects English text tokenization in ZipVoice.

Based on verification:

  1. Issue Confirmed: Different English accents (identified by BCP 47 language identifiers) produce different phonemic representations, so hard-coding "en-us" does affect tokenization output.

  2. However, Suggested Fix is Incorrect: ZipVoice uses a zero-shot approach that requires prompt_wav (audio file) and prompt_transcription as inputs, not voice name parameters like "en-us" or "en-gb". The voice parameter passed to ConvertTextToTokenIds is architecturally incompatible with ZipVoice's design—simply substituting it for the hard-coded "en-us" would not work.

  3. Root Cause: The voice parameter in the ZipVoice frontend signature appears to be for interface consistency with other TTS systems (Kokoro, Matcha, VITS), but ZipVoice doesn't actually support voice name selection. The "en-us" hard-code may be a stub awaiting proper architectural integration.

Proper resolution requires determining the intended behavior: Should English tokenization detect the accent from metadata/context, use only "en-us" (current behavior), or be made user-configurable through a different mechanism? The suggested diff would not function as intended because the voice parameter is incompatible with ZipVoice's architecture.

Likely an incorrect or invalid review comment.

scripts/apk/generate-tts-apk-script.py (1)

35-51: New ZipVoice fields on TtsModel look consistent

Extending TtsModel with vocoder, text_model, flow_matching_model, pinyin_dict, and is_zipvoice is consistent with how other model families (matcha, kitten, kokoro) are represented. As long as the Jinja templates are updated to read these new fields only when is_zipvoice is true, this shape should work well.

sherpa-onnx/java-api/Makefile (1)

50-52: Java API Makefile wiring for ZipVoice config looks correct

Adding OfflineTtsZipvoiceModelConfig.java alongside the other OfflineTts* config classes ensures the new model config is compiled and included in sherpa-onnx.jar. This matches the existing pattern in this Makefile.

sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)

89-95: Callback now correctly invoked for ZipVoice TTS results

Capturing the GeneratedAudio as ans, invoking callback(ans.samples.data(), ans.samples.size(), 1.0), and then returning ans fixes the earlier issue where the callback was ignored for ZipVoice while maintaining the original return semantics.

Just make sure all callback implementations treat the samples pointer as valid only during the call (no storing it beyond the callback), which matches how other TTS paths usually behave.

.github/workflows/run-java-test.yaml (2)

324-324: LGTM! New Zipvoice test added.

The addition of the Zipvoice TTS test script is properly integrated into the CI pipeline.


332-332: LGTM! Cleanup added for Zipvoice model.

The cleanup step appropriately removes the downloaded Zipvoice model to save space, consistent with other model cleanup steps in the workflow.

android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (2)

45-45: LGTM! WaveReader import added.

The import supports the new Zipvoice prompt-based generation path.


117-117: LGTM! Conditional speaker handling for Zipvoice.

The logic appropriately returns 1 speaker for Zipvoice models, bypassing the multi-speaker UI when not applicable.

java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (1)

1-37: LGTM! Well-structured build and run script.

The script follows best practices with:

  • Conditional builds to skip already-completed steps
  • Proper error handling via set -ex
  • Correct library paths and classpath configuration
  • Model download with appropriate checks
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (1)

9-9: LGTM! WaveReader import added.

The import supports the new Zipvoice prompt-based generation path.

android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (4)

47-49: LGTM! New Zipvoice model fields added.

The new fields support the Zipvoice TTS model configuration.


59-59: LGTM! Zipvoice flag added.

The public flag enables conditional Zipvoice behavior throughout the TTS engine.


79-84: LGTM! Zipvoice fields initialized.

Proper initialization of Zipvoice-specific fields in the init block.


223-225: LGTM! Zipvoice configuration properly passed through.

The new Zipvoice fields are correctly passed to the configuration builder.

Also applies to: 232-232

.github/workflows/apk-tts-engine.yaml (1)

25-26: Revert or clarify intent of temporary debug commit.

The matrix reduction from 40 to 2 jobs is confirmed, but the commit message "drop me: build only zipvoice" indicates this is temporary debug work. This same commit also removes signing and publishing steps. Either revert this commit before merging or clarify whether this reduced matrix and disabled signing/publishing are intentional for the final state.

.github/workflows/apk-tts.yaml (1)

25-26: No action required—coverage is maintained by design.

The build matrix reduction from 40 to 2 jobs does not impact coverage. The generate-tts-apk-script.py script dynamically divides all 245 TTS models proportionally among the specified number of runners. With total: 2 and index: [0, 1], each job processes ~122–123 models (245 ÷ 2), ensuring all models are built. The script validates num_per_runner > 0 to prevent invalid configurations. This is a legitimate CI optimization—fewer parallel jobs, identical coverage.

scripts/apk/build-apk-tts.sh.in (3)

52-62: LGTM: Conditional model cleanup logic.

The conditional cleanup correctly removes unused ONNX variants based on the selected text_model, and skips vocoder download for zipvoice models.


159-159: Verify debug build for production.

The build has been changed from assembleRelease to assembleDebug. Debug builds are not optimized and use debug signing, which may not be suitable for production APKs.

Is this change intentional for testing, or should production builds use assembleRelease?


25-30: Verify whether TTS builds intentionally support arm64-v8a only.

The TTS build scripts (build-apk-tts.sh.in and build-apk-tts-engine.sh.in) restrict to arm64-v8a only, while all other build scripts in the repository support multi-architecture builds (arm64-v8a, armeabi-v7a, x86_64, x86). This appears intentional but is inconsistent.

Confirm whether this single-architecture restriction is required for zipvoice TTS integration or if multi-architecture support should be restored.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)

10-10: LGTM: Zipvoice configuration integration.

The zipvoice field and corresponding getter/setter follow the established pattern for other TTS model configurations (vits, matcha, kokoro, kitten). The implementation is consistent and correct.

Also applies to: 20-20, 46-48, 55-55, 84-87

scripts/apk/build-apk-tts-engine.sh.in (2)

25-30: Verify single-architecture restriction.

Similar to the TTS app build script, this engine build is also restricted to arm64-v8a only. Ensure this is intentional and not a temporary limitation.


165-165: Verify debug build for production.

The build uses assembleDebug instead of assembleRelease. Confirm this is intentional for the TTS engine APK.

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

143-143: isZipvoice flag is hardcoded to false.

The isZipvoice flag is hardcoded to false on lines 143 and 204. This means the zipvoice generation path will never be executed in the current code.

Based on the PR description and CI pattern comments, this appears to be intentional for CI substitution. Confirm this is the expected behavior.

Also applies to: 204-204

sherpa-onnx/jni/offline-tts.cc (3)

121-147: LGTM: Zipvoice configuration parsing.

The zipvoice configuration parsing follows the established pattern used for other TTS models (vits, matcha, kokoro, kitten) and correctly reads all required fields.


273-301: LGTM: Prompt-based generation JNI implementation.

The JNI method properly manages resource lifetimes:

  • String UTF chars are released after use
  • Float array elements are released with JNI_ABORT (correct for read-only access)
  • Return array is properly constructed

The implementation is consistent with the existing generateImpl method.


365-432: LGTM: Callback-based prompt generation.

The callback wrapper correctly:

  • Captures the JNI environment and callback object
  • Invokes the Kotlin/Java callback with proper type conversion
  • Returns the callback result to control generation continuation

The resource management mirrors the non-callback variant and is correct.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)

1-140: LGTM: Zipvoice model configuration implementation.

The builder pattern implementation is consistent with other model configs (Vits, Matcha, Kokoro, Kitten). Default values match the Kotlin implementation and appear reasonable for the zipvoice model.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)

33-46: LGTM: generateWithPrompt overloads.

The overload chain correctly provides default values (speed=1.0f, num_step=4) and follows the pattern established by the existing generate methods.

sherpa-onnx/kotlin-api/Tts.kt (4)

47-58: LGTM: Zipvoice model configuration.

The data class definition matches the Java implementation and includes appropriate default values for all parameters.


123-136: LGTM: Prompt-based generation methods.

The new methods properly delegate to native implementations and construct GeneratedAudio from the returned array. The API is clean and consistent with existing methods.

Also applies to: 157-171


297-299: Higher thread count for zipvoice models.

Zipvoice models use 8 threads compared to 4 for Kokoro/Kitten and 2 for other models. Ensure this higher thread count is intentional and has been performance-tested.

Higher thread counts can improve throughput but may increase memory usage and context switching overhead on lower-end devices.


369-380: LGTM: Zipvoice configuration creation.

The zipvoice configuration is created conditionally and follows the same pattern as other model types. File paths are correctly constructed relative to modelDir.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
scripts/apk/generate-tts-apk-script.py (1)

589-591: Critical: all_model_list overwrite discards all non-Zipvoice models.

Line 591 overwrites the accumulated model list with only Zipvoice models, discarding all VITS, Piper, Mimic3, Coqui, Matcha, Kokoro, and Kitten models. This was already flagged in previous reviews and should be removed or moved to a separate testing configuration.

android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (2)

59-59: isZipvoice should have restricted visibility.

The isZipvoice property is public and mutable, allowing modification from anywhere. This was already flagged in a previous review suggesting it should be internal to prevent unintended changes.


191-199: Incorrect comment in Example 12.

Line 199's comment says "isKitten = true" but this example is for Zipvoice and should say "isZipvoice = true". This was already flagged in previous reviews.

scripts/apk/build-apk-tts-engine.sh.in (1)

168-168: Wildcard APK path could match multiple files.

The wildcard pattern android/SherpaOnnxTtsEngine/app/build/outputs/apk/*/*.apk could match multiple APK files (debug/release, different flavors). This was already flagged in a previous review as potentially problematic.

scripts/apk/build-apk-tts.sh.in (1)

141-145: Tighten APK selection instead of using a broad wildcard.

mv android/SherpaOnnxTts/app/build/outputs/apk/*/*.apk ./apks/...apk can match multiple APKs (e.g., multiple flavors or splits). With more than one match, mv to a single filename will fail.

Prefer selecting a single APK explicitly (e.g., the debug variant path or latest file via find/sorting) or fail fast if >1 candidate is found.

Also applies to: 159-162

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

140-151: Wire isZipvoice through activity state instead of a local flag.

Here var isZipvoice = false is a local variable, so the zipvoice branch only becomes active if an external sed-based script rewrites this line. For anyone configuring Example 12 manually in initTts(), it’s easy to set isZipvoice = true there but forget to update this local copy, leaving the prompt-based path unused.

Consider promoting isZipvoice to a private var on MainActivity, setting it in initTts() based on the chosen model, and referencing that field in onClickGenerate(). That keeps configuration and runtime behavior in sync and avoids sed depending on multiple identical literals.

Also, the hardcoded prompt_text and prompt asset path are duplicated across the app and engine modules; factoring them into a shared helper or constant would reduce maintenance overhead and aligns with the TODO in the PR description.

sherpa-onnx/kotlin-api/Tts.kt (1)

307-317: Still missing Kotlin-side validation for Zipvoice-specific model paths.

When isZipvoice is true, the earlier check now skips the “Please specify a TTS model” guard, but there is no corresponding validation that textModel, flowMatchingModel, and vocoder are non-empty before constructing OfflineTtsZipvoiceModelConfig. That allows building configs with empty paths (e.g., "$modelDir/"), pushing errors down into the native/C++ layer instead of failing fast on the Kotlin side, which was already called out in a previous review.

Consider adding explicit validation, e.g.:

     if (acousticModelName.isNotEmpty() && vocoder.isEmpty()) {
         throw IllegalArgumentException("Please provide vocoder for Matcha TTS")
     }
+
+    if (isZipvoice) {
+        if (textModel.isEmpty()) {
+            throw IllegalArgumentException("Please provide textModel for Zipvoice TTS")
+        }
+        if (flowMatchingModel.isEmpty()) {
+            throw IllegalArgumentException("Please provide flowMatchingModel for Zipvoice TTS")
+        }
+        if (vocoder.isEmpty()) {
+            throw IllegalArgumentException("Please provide vocoder for Zipvoice TTS")
+        }
+    }

This keeps the API behavior symmetric with other TTS modes and avoids confusing runtime failures when paths are missing.

Also applies to: 369-380

🧹 Nitpick comments (6)
scripts/apk/build-apk-tts.sh.in (2)

52-60: Make zipvoice cleanup more robust under set -e.

The rm commands inside the is_zipvoice block will abort the whole script if any of the expected ONNX files are missing (because of set -e). That’s fragile if model packaging ever changes.

Consider using rm -f (and/or checking for file existence) so adding/removing variants in the tarball doesn’t break APK builds.


88-99: Keep MainActivity defaults in sync with sed patterns.

The new sed injections for textModel, flowMatchingModel, pinyinDict, and isZipvoice rely on the exact literal defaults (textModel = null, flowMatchingModel = null, pinyinDict = null, isZipvoice = false) being present in MainActivity.kt.

If those defaults change (e.g., refactoring to a property or different initializer), these substitutions will silently stop working. If you keep this templating approach, document the dependency near the Kotlin fields or centralize the placeholders to avoid breakage during refactors.

Also applies to: 126-128

java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)

3-60: Zipvoice Java example is coherent and correctly wired.

The example configures OfflineTtsZipvoiceModelConfig, uses generateWithPrompt(...), and saves output with sensible logging. Class name and output filename now match the file and bilingual content.

If you later expose steps/threads as per the PR TODOs, you might add an overload here to demonstrate those knobs, but the current sample is fine as-is.

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

192-205: Zipvoice config wiring looks correct but is tightly coupled to comments/templates.

The additions for textModel, flowMatchingModel, pinyinDict, and isZipvoice in initTts() and the getOfflineTtsConfig(...) call are consistent with the new zipvoice model config (Java/Kotlin).

Just be aware that:

  • Example 12’s commented values now serve as the only in-code documentation for how to configure zipvoice on Android.
  • The build script’s sed-based replacement relies on these variable names and their initial values.

If you plan to support more zipvoice variants, it may be worth extracting a small helper or data class for “selected TTS model + flags” instead of growing this initializer further.

Also applies to: 223-229, 314-323, 329-345

sherpa-onnx/kotlin-api/Tts.kt (2)

47-58: Zipvoice config wiring looks consistent with other platforms; tuning fields are currently fixed.

The OfflineTtsZipvoiceModelConfig data class and its use in getOfflineTtsConfig look coherent: required paths (textModel, flowMatchingModel, vocoder, pinyinDict, tokens, dataDir) are passed through, and defaults for featScale, tShift, targetRms, and guidanceScale match the Java/Go shapes. If you want full cross-language parity later, consider adding parameters (or a builder-style API) on the Kotlin side to let callers override those tuning fields instead of hard-coding the defaults here.

Also applies to: 65-66, 369-380, 388-389


123-136: Prompt-based generate APIs are consistent with existing methods; consider minor polish.

The new generateWithPrompt / generateWithPromptWithCallback methods mirror the existing generate / generateWithCallback patterns and wire cleanly to the new native entry points (same parameter order and return handling). From an API standpoint, you might optionally:

  • Use Kotlin-style camelCase names (promptText, promptSamples, sampleRate, numStep) on the public methods while keeping the JNI parameter names as-is; and/or
  • Add lightweight preconditions for obviously invalid values (e.g., require(sample_rate > 0) and require(num_step > 0)) if you want errors earlier than the native layer.

These are cosmetic/ergonomic tweaks and not blockers.

Also applies to: 157-171, 223-231, 241-250

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f24b71 and 199bd99.

📒 Files selected for processing (23)
  • .github/workflows/apk-tts-engine.yaml (1 hunks)
  • .github/workflows/apk-tts.yaml (1 hunks)
  • .github/workflows/run-java-test.yaml (1 hunks)
  • .gitignore (1 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (6 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (3 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (5 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (2 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1 hunks)
  • java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1 hunks)
  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (1 hunks)
  • scripts/apk/build-apk-tts-engine.sh.in (7 hunks)
  • scripts/apk/build-apk-tts.sh.in (6 hunks)
  • scripts/apk/generate-tts-apk-script.py (4 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1 hunks)
  • sherpa-onnx/java-api/Makefile (1 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (3 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (5 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1 hunks)
  • sherpa-onnx/jni/offline-tts.cc (3 hunks)
  • sherpa-onnx/kotlin-api/Tts.kt (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h
  • .gitignore
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt
  • sherpa-onnx/java-api/Makefile
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
  • scripts/apk/build-apk-tts-engine.sh.in
  • sherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
  • sherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (6)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
  • generateWithPrompt (123-136)
  • generateWithPromptImpl (223-231)
  • samples (79-91)
  • sampleRate (107-107)
  • generateWithPromptWithCallback (157-171)
  • generateWithPromptWithCallbackImpl (241-250)
  • generateWithCallbackImpl (233-239)
sherpa-onnx/kotlin-api/Tts.kt (3)
scripts/go/sherpa_onnx.go (7)
  • OfflineTtsVitsModelConfig (921-930)
  • OfflineTtsMatchaModelConfig (932-941)
  • OfflineTtsKokoroModelConfig (943-952)
  • OfflineTtsKittenModelConfig (954-960)
  • OfflineTtsZipvoiceModelConfig (962-974)
  • GeneratedAudio (1001-1006)
  • OfflineTtsModelConfig (976-991)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
  • OfflineTtsZipvoiceModelConfig (3-140)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
  • OfflineTtsModelConfig (5-104)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
  • LibraryUtils (43-246)
sherpa-onnx/jni/offline-tts.cc (3)
sherpa-onnx/kotlin-api/Tts.kt (2)
  • tokens (47-58)
  • samples (79-91)
sherpa-onnx/jni/jni.cc (2)
  • NewInteger (17-21)
  • NewInteger (17-17)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
  • callback (98-106)
  • tts (22-399)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
  • TtsModel (28-36)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (4)
sherpa-onnx/kotlin-api/Tts.kt (1)
  • tokens (47-58)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
  • Builder (50-103)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsVitsModelConfig.java (1)
  • OfflineTtsVitsModelConfig (5-116)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsKokoroModelConfig.java (1)
  • OfflineTtsKokoroModelConfig (4-104)
🔇 Additional comments (27)
.github/workflows/run-java-test.yaml (2)

324-324: LGTM! Zipvoice test added correctly.

The new test script invocation follows the established pattern for other TTS model tests in this workflow.


332-332: Appropriate cleanup for zipvoice artifacts.

The cleanup step correctly removes the downloaded zipvoice model directory to save disk space, consistent with other model cleanup steps in the workflow.

scripts/apk/generate-tts-apk-script.py (2)

35-51: LGTM! Zipvoice fields added to TtsModel.

The new fields (text_model, flow_matching_model, pinyin_dict, is_zipvoice) appropriately extend the TtsModel dataclass to support Zipvoice-specific configuration.


547-572: Zipvoice model factory function looks good.

The get_zipvoice_models() function correctly constructs Zipvoice model configurations with appropriate field initialization and data_dir setup.

sherpa-onnx/jni/offline-tts.cc (3)

122-147: LGTM! Zipvoice configuration parsing implemented correctly.

The JNI code properly reads all Zipvoice-specific fields (tokens, textModel, flowMatchingModel, vocoder, dataDir, pinyinDict, and float parameters) using the standard SHERPA_ONNX_JNI_READ_* macros, consistent with other model configurations in this file.


273-301: Proper JNI resource management in generateWithPromptImpl.

The implementation correctly:

  • Releases JNI strings with ReleaseStringUTFChars for both text and prompt_text
  • Releases float array elements with ReleaseFloatArrayElements using JNI_ABORT (no need to copy back)
  • Constructs the return object array with audio samples and sample rate

365-432: Callback wrapper and resource management look correct.

The generateWithPromptWithCallbackImpl function properly:

  • Creates a C++ callback wrapper that invokes the Java callback method
  • Handles resource cleanup for strings and float arrays
  • Returns audio samples and sample rate in the expected format

The callback pattern matches the existing generateWithCallbackImpl implementation.

.github/workflows/apk-tts.yaml (1)

25-26: Verify: Matrix reduced from 40 to 2 jobs - is this intentional?

Same as in apk-tts-engine.yaml, the build matrix has been reduced from 40 to 2 jobs. This change should be coordinated with the model list modifications in generate-tts-apk-script.py. If you intend to build APKs for all TTS families (not just Zipvoice), you'll need more than 2 jobs.

java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (4)

5-20: JNI build logic is correct.

The conditional build checks for both macOS (.dylib) and Linux (.so) library files and builds only if neither exists. The CMake configuration and build steps match other test scripts in the repository.


22-26: Java API build check is correct.

The script properly checks for the Java API jar file and builds it only if missing, avoiding unnecessary rebuilds.


28-32: Model download and extraction handled correctly.

The script checks for a specific model file (fm_decoder.onnx) to determine if the model has already been downloaded, then downloads, extracts, and cleans up the archive appropriately.


34-37: Java execution command is correct.

The java command properly sets the library path and classpath to run the NonStreamingTtsZipvoiceZhEn test.

android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (3)

47-49: LGTM! Zipvoice-specific fields added.

The private fields (textModel, flowMatchingModel, pinyinDict) appropriately extend TtsEngine to support Zipvoice configuration.


79-84: Proper initialization of Zipvoice fields.

The Zipvoice-specific fields are correctly initialized to null in the init block, consistent with other optional model configurations.


223-232: Zipvoice configuration properly passed to OfflineTts.

The getOfflineTtsConfig call correctly includes all Zipvoice-specific parameters (textModel, flowMatchingModel, pinyinDict, isZipvoice) alongside existing TTS configuration fields.

scripts/apk/build-apk-tts-engine.sh.in (8)

25-30: Verify: Multi-arch builds disabled - is this intentional?

The script now only builds for arm64-v8a, with armv7-eabi, x86-64, and x86 builds commented out. This dramatically reduces APK variants from 4 architectures to 1. If this is temporary for Zipvoice testing, please add a comment. Otherwise, this will break support for 32-bit ARM devices and x86 emulators/devices.


43-45: Zipvoice model fields properly extracted.

The script correctly extracts text_model, flow_matching_model, and pinyin_dict from the TtsModel template variables.


54-60: Smart cleanup for Zipvoice model variants.

The conditional cleanup removes unused int8/fp32 model variants based on which text_model is selected, reducing APK size. This is a good optimization.


62-64: Correct vocoder fetch logic for Zipvoice.

The script properly skips vocoder download for Zipvoice models (which bundle their own vocoder) while still downloading vocos for Matcha models. The condition not tts_model.is_zipvoice correctly distinguishes the two cases.


82-84: isZipvoice flag properly set via sed.

The Jinja2 conditional correctly updates TtsEngine.kt to set isZipvoice = true when processing Zipvoice models.


102-112: Zipvoice model configuration correctly injected.

The sed commands properly update TtsEngine.kt with textModel, flowMatchingModel, and pinyinDict values when these fields are present in the TtsModel.


147-147: Single-arch build loop matches disabled multi-arch builds.

The for loop now only iterates over arm64-v8a, consistent with the commented-out multi-arch build steps. See earlier comment about verifying if this is intentional.


165-165: Verify: assembleDebug instead of assembleRelease?

The build now uses assembleDebug instead of assembleRelease. Debug builds:

  • Are larger (no ProGuard/R8 optimization)
  • Include debugging symbols
  • Are not signed with release keys
  • May have different performance characteristics

If this is temporary for testing, please add a comment. For production APKs, you should use assembleRelease.

.github/workflows/apk-tts-engine.yaml (1)

25-26: Matrix reduction to 2 jobs is intentional and correctly configured for Zipvoice testing

The verification confirms this change is deliberate. The script aggregates all 245 models across 8 TTS families (lines 582–589), but line 591 explicitly overrides all_model_list with only the 2 Zipvoice models. The workflow matrix total: ["2"] with index: ["0", "1"] is correctly sized for this testing configuration. When all families need to be restored, both the line 591 override and the workflow matrix will need to be updated.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)

3-140: Zipvoice model config class is consistent with existing config types.

The immutable fields, builder, defaults, and accessor names align with the other OfflineTts*ModelConfig classes and the Kotlin OfflineTtsZipvoiceModelConfig. This should integrate cleanly with OfflineTtsModelConfig and the JNI layer.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)

33-47: Prompt-based OfflineTts APIs are well-structured and align with existing patterns.

The new generateWithPrompt and generateWithPromptWithCallback overloads mirror the existing generate* and generateWithCallback patterns, with sensible defaults for speed and num_step and clean wrapping of the native Object[] return into GeneratedAudio. The native declarations also match the usage sites.

No issues from the Java side.

Also applies to: 63-76, 99-104

sherpa-onnx/kotlin-api/Tts.kt (1)

262-279: The review comment concern does not apply—both call sites use named arguments, which are position-independent in Kotlin.

Both call sites in the codebase (TtsEngine.kt:217 and MainActivity.kt:329) use named arguments when calling getOfflineTtsConfig(). In Kotlin, named arguments are position-independent, so changing the parameter order in the function definition does not break compatibility. All new parameters (textModel, flowMatchingModel, pinyinDict, isZipvoice) have been correctly provided in both calls, and optional parameters use their defaults.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (6)
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (2)

203-208: Hardcoded prompt and indentation issues flagged in previous review.

This code segment has already been reviewed with feedback about hardcoded prompt text, hardcoded asset path, and inconsistent indentation (tabs vs spaces).


206-208: Add error handling for WaveReader.readWave to prevent crashes.

WaveReader.readWave could fail if the asset doesn't exist or is malformed, which would cause a crash when accessing reader.samples or reader.sampleRate on line 208. Wrap the asset loading in a try-catch block and provide a graceful fallback or user notification.

-    val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav")
-
-		    TtsEngine.tts!!.generateWithPromptWithCallback(testText, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ::callback)} else 
+                                                        val reader = try {
+                                                            WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav")
+                                                        } catch (e: Exception) {
+                                                            Log.e(TAG, "Failed to load prompt audio: ${e.message}")
+                                                            withContext(Dispatchers.Main) {
+                                                                Toast.makeText(
+                                                                    applicationContext,
+                                                                    "Failed to load prompt audio",
+                                                                    Toast.LENGTH_SHORT
+                                                                ).show()
+                                                                startEnabled = true
+                                                            }
+                                                            return@launch
+                                                        }
+                                                        TtsEngine.tts!!.generateWithPromptWithCallback(testText, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ::callback)
+                                                    } else
scripts/apk/build-apk-tts.sh.in (1)

141-165: Two ZipVoice variants generate identically named APKs, and wildcard mv is still fragile

There are two ZipVoice TtsModel entries with the same model_dir and lang, but different {text,flow} pairs. When both flow through this script:

mv android/SherpaOnnxTts/app/build/outputs/apk/*/*.apk \
   ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-$model_dir.apk
  • Both builds target the same destination filename, so the second run overwrites the first; you end up with only one ZipVoice APK per arch even though two models are defined.
  • The broad */*.apk wildcard can still match multiple APKs (e.g., multiple flavors or debug+other variants), which can make mv fail or behave unexpectedly — this was already raised in a previous review.

Consider:

  • Adding a variant suffix for ZipVoice based on text_model (e.g. -fp16 vs -int8) so filenames are unique, and
  • Narrowing the source glob or explicitly selecting a single APK (e.g. by flavor or using a small bash check to ensure exactly one match) before moving.

For example (conceptually):

variant_suffix=""
{% if tts_model.is_zipvoice and tts_model.text_model == "text_encoder_int8.onnx" %}
variant_suffix="-int8"
{% endif %}

mv android/SherpaOnnxTts/app/build/outputs/apk/debug/*.apk \
   ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-$model_dir${variant_suffix}.apk
scripts/apk/build-apk-tts-engine.sh.in (1)

147-171: Two ZipVoice engine variants share an APK name, and wildcard mv remains brittle

For engine builds you again have two ZipVoice models with the same model_dir and lang. The mv command:

mv android/SherpaOnnxTtsEngine/app/build/outputs/apk/*/*.apk \
   ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-engine-$model_dir.apk
  • Produces the same destination filename for both runs, so the second ZipVoice engine APK overwrites the first; you only retain one variant.
  • Uses a broad */*.apk glob, which can match multiple APKs and was already flagged as a potential issue in earlier reviews.

It would be safer to:

  • Include a variant suffix for ZipVoice based on the model (e.g. -int8 vs -fp16) so filenames are unique, and
  • Narrow the glob to a specific variant directory (e.g. debug) or explicitly select a single APK before moving.

Also applies to: 168-168

scripts/apk/generate-tts-apk-script.py (1)

547-572: ZipVoice model wiring is correct, but overwriting all_model_list changes script semantics

  • get_zipvoice_models() defines two variants of the same model_dir with different {text, flow_matching}_model values and marks them is_zipvoice = True; that matches how the APK templates consume ZipVoice-specific fields.
  • After aggregating VITS/Piper/Mimic3/Coqui/Matcha/Kokoro/Kitten models, all_model_list is first extended with ZipVoice, then immediately reset to get_zipvoice_models(). This discards all previously-added models and makes the script “ZipVoice-only”.

This is consistent with the current CI setup (matrix total=2; two ZipVoice models), but:

  • Anyone expecting this script to generate APKs for all models will now only see ZipVoice.
  • If --total is later increased without adding more ZipVoice entries, sharding will fail (num_per_runner <= 0).

If the intent is temporary ZipVoice-only builds, consider:

  • Removing the overwrite and leaving the aggregated list; or
  • Guarding it behind a flag (e.g. --zipvoice-only), plus a comment; and
  • Optionally asserting total <= len(get_zipvoice_models()) when in ZipVoice-only mode.

Also applies to: 582-592

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

143-150: Zipvoice branch in onClickGenerate depends on a local flag; consider promoting isZipvoice to shared state.

Here isZipvoice is a local variable initialized to false, and the Zipvoice path (prompt-based generateWithPromptWithCallback plus WaveReader.readWave) is only taken when it’s true. Separately, initTts() has its own local isZipvoice used when building the config.

That split makes it easy for callers (or future edits/scripts) to flip Zipvoice on in initTts() but forget to do the same in onClickGenerate, leading to a Zipvoice-configured model still going through the non-prompt generateWithCallback branch.

To avoid this divergence, consider:

  • Defining private var isZipvoice: Boolean = false as a MainActivity property.
  • Setting it in initTts() alongside the Zipvoice model selection.
  • Using that same property in onClickGenerate instead of a new local var isZipvoice.

This keeps model selection and runtime path selection in sync.

🧹 Nitpick comments (11)
.github/workflows/apk-tts.yaml (1)

21-27: Matrix total is now tightly coupled to the number of ZipVoice models

With total: ["2"] and index: ["0","1"], generate-tts-apk-script.py will shard assuming exactly 2 models (after the ZipVoice-only overwrite). If you later add/remove ZipVoice variants or re-enable other model families, this hard-coded total can cause num_per_runner <= 0 or uneven sharding.

Consider either:

  • Deriving total from the model list (or asserting total <= len(get_zipvoice_models())), or
  • Adding a short comment in the workflow noting that total must match the ZipVoice model count.
scripts/apk/generate-tts-apk-script.py (1)

31-52: New ZipVoice fields look consistent; consider documenting their use

The added fields (vocoder, text_model, flow_matching_model, pinyin_dict, is_zipvoice) integrate cleanly with existing code thanks to defaults. They’re only consumed where needed (APK templates), so they don’t break older models.

You might want to:

  • Add a brief class-level docstring or comment explaining which backends use each field (VITS/Matcha/Kokoro/ZipVoice), to avoid future confusion.
  • Optionally align this TtsModel shape with the Flutter generator’s TtsModel (scripts/flutter/generate-tts.py) to keep configs in sync.
.github/workflows/apk-tts-engine.yaml (1)

21-27: Engine workflow now assumes exactly 2 models; keep in sync with generator

Like apk-tts.yaml, this matrix (total: ["2"], index: ["0","1"]) assumes generate-tts-apk-script.py is operating over exactly 2 models (currently the two ZipVoice entries). If you later change the model set or revert to all models, remember to update total or make the script derive/check it, otherwise sharding will break.

The addition of fetch-depth: 0 is appropriate for versioning.

Also applies to: 29-32

scripts/apk/build-apk-tts.sh.in (2)

23-30: Single‑arch arm64-v8a + assembleDebug is consistent but narrows output scope

You’ve effectively reduced the build to:

  • Only arm64-v8a (other ABIs commented out), and
  • Debug builds via ./gradlew assembleDebug.

That’s good for faster CI and unsigned artifacts, but it means:

  • No multi‑ABI coverage from this script.
  • Produced APKs are debug-only (no release configuration, ProGuard/minify, etc.).

If you intend these artifacts purely for testing, this is fine; otherwise, consider documenting that this path is “debug/arm64-only” or leaving a TODO to reintroduce release or additional ABIs later.

Also applies to: 141-155, 159-160


36-46: ZipVoice asset handling and vocoder download guard look reasonable

  • The new text_model, flow_matching_model, and pinyin_dict variables and corresponding sed updates correctly propagate ZipVoice config into MainActivity.kt only when non-empty.
  • The is_zipvoice flag and conditional vocoder download
{% if tts_model.vocoder and not tts_model.is_zipvoice %}
  curl ...
{% endif %}

prevent downloading an external vocoder when ZipVoice bundles its own, which avoids redundant assets.

The rm block that deletes the unused {text,flow} pair for ZipVoice variants is logically correct, but tightly couples the script to exact filenames. If you expect these filenames to evolve, you might want to either:

  • Switch to rm -f to avoid hard failures when variants change, or
  • Keep it strict but add a short comment noting that the ZipVoice tarball must always contain both int8 and non‑int8 pairs.

Also applies to: 52-61, 88-99, 126-128

scripts/apk/build-apk-tts-engine.sh.in (1)

23-30: Engine build is now arm64‑only and debug‑only

As with the TTS APK script, this one now:

  • Builds only arm64-v8a, with other ABIs commented out.
  • Uses ./gradlew assembleDebug rather than a release target.

That’s fine if these engine APKs are intended for CI/testing only. If you later want release or multi‑ABI coverage, it may be worth adding a brief comment or TODO here to avoid confusion.

Also applies to: 147-156, 163-166

sherpa-onnx/kotlin-api/Tts.kt (2)

47-58: Zipvoice model config wiring is consistent; advanced params currently fixed to defaults.

The new OfflineTtsZipvoiceModelConfig plus the zipvoice field in OfflineTtsModelConfig and its construction in getOfflineTtsConfig look consistent with the Java/Go structs (tokens, textModel, flowMatchingModel, vocoder, dataDir, pinyinDict, featScale, tShift, targetRms, guidanceScale). Right now, getOfflineTtsConfig only sets paths and relies on the default numeric values; that’s fine, but if you later need to tune featScale/tShift/targetRms/guidanceScale from Android, you’ll need to thread those through this helper as well.

Also applies to: 65-69, 381-392, 395-401


259-279: Zipvoice validation covers required fields; consider enforcing “one active model type” for configs.

The updated validation (model presence check relaxed for isZipvoice, plus checks for vocoder, textModel, flowMatchingModel, and pinyinDict when isZipvoice is true) aligns well with the Zipvoice requirements and avoids late failures in native code.

One remaining edge case is that callers can now accidentally enable Zipvoice and another model type (e.g., VITS via modelName, Matcha via acousticModelName, or Kokoro/Kitten via voices) at the same time. To make misconfiguration fail fast, consider adding a small guard like:

val enabledModelTypes = listOf(
    modelName.isNotEmpty(),          // VITS
    acousticModelName.isNotEmpty(),  // Matcha
    voices.isNotEmpty(),             // Kokoro/Kitten
    isZipvoice                      // Zipvoice
).count { it }

if (enabledModelTypes == 0) {
    throw IllegalArgumentException("Please specify exactly one TTS model type")
}
if (enabledModelTypes > 1) {
    throw IllegalArgumentException("Please specify only one TTS model type (VITS, Matcha, Kokoro/Kitten, or Zipvoice)")
}

This would prevent ambiguous configs while keeping the Zipvoice path clean.

Also applies to: 289-305, 307-329

java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)

1-60: Example wiring for Zipvoice looks correct; optionally guard RTF math for empty output.

The example correctly:

  • Builds OfflineTtsZipvoiceModelConfig with paths for text model, flow-matching model, vocoder, pinyin dict, tokens, and espeak-ng-data.
  • Wraps it in OfflineTtsModelConfig and OfflineTtsConfig.
  • Uses WaveReader for the prompt WAV and calls generateWithPrompt(text, prompt_text, samples, sampleRate).

One small robustness tweak: if generation fails and returns zero samples, audioDuration becomes 0 and real_time_factor turns into Infinity/NaN. You could defensively handle that case:

float audioDuration = audio.getSamples().length / (float) audio.getSampleRate();
float real_time_factor = audioDuration > 0 ? timeElapsedSeconds / audioDuration : 0.0f;

This keeps the example stable even in failure modes.

android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)

189-205: Zipvoice config variables and Example 12 comments line up with Kotlin helper; minor naming nit in comment.

The added Zipvoice variables (textModel, flowMatchingModel, pinyinDict, isZipvoice) and their use in the getOfflineTtsConfig call (textModel = textModel ?: "", etc., plus isZipvoice = isZipvoice) correctly plumb the Android-side settings into the Kotlin helper and onward into OfflineTtsZipvoiceModelConfig.

The Example 12 comment block is now consistent with the Zipvoice path (// isZipvoice = true). One tiny nit: the comment line // pinyinDict = "pinyinDict" is a bit generic and doesn’t match the actual example path used elsewhere (pinyin.raw); if you want to reduce confusion for users copying these snippets, you might update it to the concrete filename you expect (e.g., "pinyin.raw").

Also applies to: 223-229, 314-323, 329-345

sherpa-onnx/jni/offline-tts.cc (1)

377-412: Extract duplicated callback wrapper logic.

The callback wrapper logic here (lines 377-412) is nearly identical to the code in generateWithCallbackImpl (lines 310-345). This duplication reduces maintainability and increases the risk of inconsistent bug fixes.

Consider extracting the common callback wrapper creation into a helper function:

// Add helper function in the file
static std::function<int32_t(const float *, int32_t, float)>
CreateCallbackWrapper(JNIEnv *env, jobject callback) {
  return [env, callback](const float *samples, int32_t n,
                        float /*progress*/) -> int {
    jclass cls = env->GetObjectClass(callback);
    jmethodID mid = env->GetMethodID(cls, "invoke", "([F)Ljava/lang/Integer;");
    if (mid == nullptr) {
      SHERPA_ONNX_LOGE("Failed to get the callback. Ignore it.");
      return 1;
    }

    jfloatArray samples_arr = env->NewFloatArray(n);
    env->SetFloatArrayRegion(samples_arr, 0, n, samples);

    jobject should_continue = env->CallObjectMethod(callback, mid, samples_arr);
    jclass jklass = env->GetObjectClass(should_continue);
    jmethodID int_value_mid = env->GetMethodID(jklass, "intValue", "()I");
    return env->CallIntMethod(should_continue, int_value_mid);
  };
}

Then use it in both functions:

auto callback_wrapper = CreateCallbackWrapper(env, callback);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 199bd99 and 85f45f8.

📒 Files selected for processing (23)
  • .github/workflows/apk-tts-engine.yaml (1 hunks)
  • .github/workflows/apk-tts.yaml (1 hunks)
  • .github/workflows/run-java-test.yaml (1 hunks)
  • .gitignore (1 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (6 hunks)
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (3 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (5 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (2 hunks)
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1 hunks)
  • java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1 hunks)
  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (1 hunks)
  • scripts/apk/build-apk-tts-engine.sh.in (7 hunks)
  • scripts/apk/build-apk-tts.sh.in (6 hunks)
  • scripts/apk/generate-tts-apk-script.py (4 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1 hunks)
  • sherpa-onnx/java-api/Makefile (1 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (3 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (5 hunks)
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1 hunks)
  • sherpa-onnx/jni/offline-tts.cc (3 hunks)
  • sherpa-onnx/kotlin-api/Tts.kt (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • sherpa-onnx/csrc/offline-tts-zipvoice-impl.h
  • java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
  • .github/workflows/run-java-test.yaml
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt
  • android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
  • sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt
  • android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

  • scripts/apk/build-apk-tts-engine.sh.in
  • scripts/apk/build-apk-tts.sh.in
  • sherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

  • .gitignore
  • sherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (6)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
  • LibraryUtils (43-246)
sherpa-onnx/kotlin-api/Tts.kt (4)
scripts/go/sherpa_onnx.go (7)
  • OfflineTtsVitsModelConfig (921-930)
  • OfflineTtsMatchaModelConfig (932-941)
  • OfflineTtsKokoroModelConfig (943-952)
  • OfflineTtsKittenModelConfig (954-960)
  • OfflineTtsZipvoiceModelConfig (962-974)
  • GeneratedAudio (1001-1006)
  • OfflineTtsModelConfig (976-991)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
  • OfflineTtsZipvoiceModelConfig (3-140)
android/SherpaOnnxAar/sherpa_onnx/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
  • generateWithPromptImpl (223-231)
  • generateWithPromptWithCallbackImpl (241-250)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
  • OfflineTtsModelConfig (5-104)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
  • generateWithPrompt (123-136)
  • generateWithPromptImpl (223-231)
  • samples (79-91)
  • sampleRate (107-107)
  • generateWithPromptWithCallback (157-171)
  • generateWithPromptWithCallbackImpl (241-250)
  • generateWithCallbackImpl (233-239)
sherpa-onnx/jni/offline-tts.cc (2)
sherpa-onnx/kotlin-api/Tts.kt (6)
  • model (6-15)
  • model (28-37)
  • model (39-45)
  • model (71-77)
  • tokens (47-58)
  • samples (79-91)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
  • callback (98-106)
  • tts (22-399)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
  • Builder (73-139)
  • OfflineTtsZipvoiceModelConfig (3-140)
sherpa-onnx/kotlin-api/Tts.kt (1)
  • vits (60-69)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
  • TtsModel (28-36)
🔇 Additional comments (10)
.gitignore (1)

161-161: LGTM!

The addition correctly ignores the new ZipVoice model artifact directory and follows the existing naming convention for sherpa-onnx model artifacts in the .gitignore file.

android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (1)

117-117: ****

The hardcoding of numSpeakers = 1 for ZipVoice is correct. ZipVoice uses a different synthesis approach (generateWithPromptWithCallback) that relies on prompt-based voice cloning rather than speaker ID selection. Unlike traditional multi-speaker TTS models (VITS, Matcha), ZipVoice doesn't support multiple speakers and instead uses prompt audio samples for voice cloning, making the hardcoded value of 1 the appropriate design choice to suppress the speaker ID selection UI.

scripts/apk/build-apk-tts-engine.sh.in (1)

36-49: ZipVoice + ISO‑639 wiring into TtsEngine.kt looks consistent

  • lang_iso_639_3 / _2 are injected into TtsEngine.kt via sed and are populated in Python by convert_lang_to_iso_639_3, so languages for ZipVoice and others are correctly normalized.
  • New fields (text_model, flow_matching_model, pinyin_dict, isZipvoice) are only written when present, which keeps non‑ZipVoice engines unchanged.
  • The vocoder download is correctly skipped when is_zipvoice is true, relying on the bundled ZipVoice assets instead.

This all lines up with the generator changes and shouldn’t affect existing engines.

Also applies to: 69-80, 82-113

sherpa-onnx/kotlin-api/Tts.kt (1)

123-137: Prompt-based generate APIs mirror existing patterns and JNI signatures.

generateWithPrompt and generateWithPromptWithCallback follow the same pattern as the existing generate/generateWithCallback wrappers: they call the corresponding external functions, expect a 2-element Array<Any> (FloatArray samples + Int sample rate), and wrap it in GeneratedAudio. The external declarations match the Android Tts.kt snippet (ptr, text, prompt_text, FloatArray prompt_samples, Int sample_rate, Float speed, Int num_step), so the Kotlin surface looks coherent for both desktop and Android Zipvoice usage.

Also applies to: 157-171, 223-232, 241-250

sherpa-onnx/java-api/Makefile (1)

50-52: Including OfflineTtsZipvoiceModelConfig.java in the jar build is correct.

Adding the Zipvoice config class to java_files ensures it’s compiled and present in sherpa-onnx.jar, matching the new Java API surface.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)

10-24: Zipvoice field integration in Java model config is consistent and non-breaking.

The new zipvoice field, getter, builder default, and setZipvoice setter align with the Kotlin OfflineTtsModelConfig and the standalone OfflineTtsZipvoiceModelConfig. Defaulting to OfflineTtsZipvoiceModelConfig.builder().build() preserves backward compatibility (non-null field) while allowing callers to override Zipvoice settings explicitly.

Also applies to: 46-48, 50-56, 84-87

sherpa-onnx/jni/offline-tts.cc (1)

121-148: LGTM!

The Zipvoice model configuration parsing follows the established pattern for other model types and correctly maps all fields from the Kotlin/Java API to the C++ configuration struct.

sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (3)

33-46: LGTM!

The generateWithPrompt overload chain follows the established pattern and provides sensible defaults (speed=1.0f, num_step=4) consistent with the existing API design and Kotlin defaults.


63-76: LGTM!

The generateWithPromptWithCallback overload chain is consistent with the existing callback API pattern and uses the same sensible defaults as generateWithPrompt.


99-103: LGTM!

The native method declarations correctly match the JNI function signatures and follow the established pattern for returning audio data.

@csukuangfj
Copy link
Collaborator

Thanks!

Can you use APIs from #3344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants