add android tts engine for zipvoice tts#2806
add android tts engine for zipvoice tts#2806rhjdvsgsgks wants to merge 9 commits intok2-fsa:masterfrom
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Zipvoice (prompt-based TTS) across native, JNI, Java/Kotlin APIs, Android apps, build scripts, examples, and CI; introduces prompt-aware generation methods and configs, a WaveReader utility, ZIPVoice wiring in Android/TtsEngine/TtsService, JNI/native prompt methods, example runner, and simplifies APK CI matrix/signing/publishing. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Android App
participant Service as TtsService
participant Wave as WaveReader
participant JNI as JNI Bridge
participant Core as OfflineTts (native)
alt isZipvoice = true
App->>Service: synthesize(text)
Service->>Wave: load prompt.wav
Wave-->>Service: samples, sampleRate
Service->>JNI: generateWithPromptWithCallback(text, prompt_text, samples, sampleRate, callback)
JNI->>Core: generateWithPrompt(...)
Core-->>JNI: audio chunks
JNI->>Service: callback(chunk)
Service-->>App: play audio
else
App->>Service: synthesize(text)
Service->>JNI: generateWithCallback(text, speakerId, speed, callback)
JNI->>Core: generate(...)
Core-->>JNI: audio chunks
JNI->>Service: callback(chunk)
Service-->>App: play audio
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50–70 minutes Areas needing extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (13)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-08-06T04:18:47.981ZApplied to files:
🧬 Code graph analysis (1)sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @rhjdvsgsgks, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's Text-to-Speech capabilities by integrating the Zipvoice TTS engine, specifically for Android. It provides comprehensive Java and Kotlin APIs for interacting with Zipvoice, including a novel Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the zipvoice TTS model to the Android and Java/Kotlin APIs, including a new generateWithPrompt API for zero-shot TTS. The changes are extensive, covering Android demo apps, the TTS engine, Java/Kotlin APIs, JNI bindings, and build scripts. Overall, the implementation is good, but there are a few areas for improvement. I've identified a critical bug in the Android demo app where a feature flag is hardcoded, making the new functionality unreachable. Additionally, there's significant code duplication with hardcoded values for testing zipvoice across multiple files, which should be refactored. I've also left some comments on minor issues regarding naming conventions, file paths, and misleading comments.
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt
Show resolved
Hide resolved
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt
Show resolved
Hide resolved
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt
Outdated
Show resolved
Hide resolved
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
Show resolved
Hide resolved
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt
Show resolved
Hide resolved
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java
Outdated
Show resolved
Hide resolved
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (1)
150-160: Hardcoded prompt text and audio path should be configurable.The same hardcoded prompt text and audio file path from MainActivity.kt are duplicated here. This violates DRY principles and has the same indentation issues.
Apply this diff to fix indentation:
- if (TtsEngine.isZipvoice) { - val prompt_text = "周日被我射熄火了,所以今天是周一。" - - val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav") - - tts.generateWithPromptWithCallback(text, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ttsCallback)} else tts.generateWithCallback( + if (TtsEngine.isZipvoice) { + val prompt_text = "周日被我射熄火了,所以今天是周一。" + val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav") + tts.generateWithPromptWithCallback(text, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ttsCallback) + } else tts.generateWithCallback(Consider centralizing these Zipvoice configuration values (prompt text and audio path) in TtsEngine.kt to avoid duplication across MainActivity.kt and TtsService.kt. This aligns with the PR TODO: "Allow customizing prompt audio/text in Android TTS."
♻️ Duplicate comments (1)
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt (1)
1-1: Same symlinkedWaveReader.ktpattern as in TTS engine moduleThis file is also a symlink to
sherpa-onnx/kotlin-api/WaveReader.kt, with the same portability and tooling concerns mentioned forandroid/SherpaOnnxTtsEngine/.../WaveReader.kt. If you decide to switch to a shared Kotlin module or a copied source there, it would be good to apply the same approach here for consistency.
🧹 Nitpick comments (6)
scripts/apk/generate-tts-apk-script.py (1)
547-572: Clarify duplicatedmodel_direntries inget_zipvoice_models()Both ZipVoice entries use the same
model_dirbut differ only intext_model/flow_matching_model(float vs int8). If any downstream logic or templates key bymodel_diralone (e.g., directory creation, download steps, display labels), this duplication could cause confusing behavior or redundant work.If you intend to ship both precision variants side by side, consider either:
- Making the distinction explicit in metadata (e.g., a
variantfield), or- Verifying templates only rely on the per-field filenames and do not assume
model_diruniqueness.android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt (1)
1-1: SymlinkedWaveReader.ktmay be fragile across platforms and toolingThis file appears to be a Git symlink pointing back to
sherpa-onnx/kotlin-api/WaveReader.kt. That works on POSIX systems, but can be brittle on:
- Windows dev environments where symlink support is limited or disabled.
- Tools/editors/IDEs that don’t resolve symlinked Kotlin sources well.
- Future directory restructurings, since the long
../chain is easy to break.If you expect contributors/builds on mixed platforms, consider instead:
- Moving
WaveReaderinto a small shared Kotlin module and adding a Gradle dependency from this engine module, or- Copying the file here and keeping it synchronized via a simple script.
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
143-150: Hardcoded prompt text and audio path.The prompt text and audio file path are hardcoded in the generation logic. While this works for demonstration purposes, consider:
- Making the prompt text and audio path configurable via UI or settings
- Adding error handling for missing prompt audio files
This aligns with the TODO mentioned in the PR description: "Allow customizing prompt audio/text in Android TTS."
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
61-63: Inconsistent getter naming: gettShift.The getter method is named
gettShift()instead of following the standard JavaBean conventiongetTShift(). While this works functionally, it's inconsistent with naming conventions and the corresponding settersettShift().Consider renaming to
getTShift()for consistency, unless there's a specific reason for the lowercase 't'.sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
33-46: Consider adding parameter validation.The new
generateWithPromptandgenerateWithPromptWithCallbackmethods don't validate parameters likeprompt_text,prompt_samples, orsample_rate. Consider adding validation to:
- Check for null/empty prompt_text
- Check for null/empty prompt_samples array
- Validate sample_rate is positive
This would provide clearer error messages compared to potential JNI crashes.
Example validation:
public GeneratedAudio generateWithPrompt(String text, String prompt_text, float[] prompt_samples, int sample_rate, float speed, int num_step) { + if (prompt_text == null || prompt_text.isEmpty()) { + throw new IllegalArgumentException("prompt_text cannot be null or empty"); + } + if (prompt_samples == null || prompt_samples.length == 0) { + throw new IllegalArgumentException("prompt_samples cannot be null or empty"); + } + if (sample_rate <= 0) { + throw new IllegalArgumentException("sample_rate must be positive"); + } Object[] arr = generateWithPromptImpl(ptr, text, prompt_text, prompt_samples, sample_rate, speed, num_step);Also applies to: 63-76
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
6-11: Consider adding a comment about model download location.For consistency with other examples (e.g.,
NonStreamingTtsPiperEn.javalines 7-9), consider adding a comment indicating where users can download the Zipvoice models.For example:
+ // please visit + // https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models + // to download model files String textModel = "sherpa-onnx-zipvoice-distill-zh-en-emilia/text_encoder.onnx";
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
.github/workflows/apk-tts-engine.yaml(1 hunks).github/workflows/apk-tts.yaml(1 hunks).github/workflows/run-java-test.yaml(1 hunks).gitignore(1 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt(6 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt(1 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt(3 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt(5 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt(2 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt(1 hunks)java-api-examples/NonStreamingTtsZipvoiceZhEn.java(1 hunks)java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh(1 hunks)scripts/apk/build-apk-tts-engine.sh.in(7 hunks)scripts/apk/build-apk-tts.sh.in(6 hunks)scripts/apk/generate-tts-apk-script.py(4 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc(1 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-impl.h(1 hunks)sherpa-onnx/java-api/Makefile(1 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java(3 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java(5 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java(1 hunks)sherpa-onnx/jni/offline-tts.cc(3 hunks)sherpa-onnx/kotlin-api/Tts.kt(8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
.gitignorejava-api-examples/run-non-streaming-tts-zipvoice-zh-en.shsherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
java-api-examples/run-non-streaming-tts-zipvoice-zh-en.shscripts/apk/build-apk-tts-engine.sh.insherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (8)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
Builder(73-139)OfflineTtsZipvoiceModelConfig(3-140)sherpa-onnx/kotlin-api/Tts.kt (1)
vits(60-69)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
LibraryUtils(43-246)java-api-examples/NonStreamingTtsPiperEn.java (2)
NonStreamingTtsPiperEn(7-58)main(8-57)
sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)
sherpa-onnx/csrc/offline-tts-matcha-impl.h (1)
Process(425-430)
sherpa-onnx/jni/offline-tts.cc (4)
sherpa-onnx/kotlin-api/Tts.kt (2)
tokens(47-58)samples(79-91)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
tokens(47-58)samples(79-91)sherpa-onnx/jni/jni.cc (2)
NewInteger(17-21)NewInteger(17-17)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
callback(98-106)tts(22-399)
sherpa-onnx/kotlin-api/Tts.kt (4)
scripts/go/sherpa_onnx.go (7)
OfflineTtsVitsModelConfig(921-930)OfflineTtsMatchaModelConfig(932-941)OfflineTtsKokoroModelConfig(943-952)OfflineTtsKittenModelConfig(954-960)OfflineTtsZipvoiceModelConfig(962-974)GeneratedAudio(1001-1006)OfflineTtsModelConfig(976-991)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
OfflineTtsZipvoiceModelConfig(3-140)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
generateWithPromptImpl(223-231)generateWithPromptWithCallbackImpl(241-250)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
OfflineTtsModelConfig(5-104)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
TtsModel(28-36)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
generateWithPrompt(123-136)generateWithPromptImpl(223-231)samples(79-91)sampleRate(107-107)generateWithPromptWithCallback(157-171)generateWithPromptWithCallbackImpl(241-250)generateWithCallbackImpl(233-239)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
sherpa-onnx/kotlin-api/Tts.kt (1)
tokens(47-58)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
Builder(50-103)
🔇 Additional comments (33)
.gitignore (1)
161-161: Good—follows repository conventions for model exclusions.Adding the ZipVoice model artifact to
.gitignoreis appropriate and aligns with how other model directories are already excluded in the repository.sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc (1)
345-345: Now I need to verify what TokenizeEn does to confirm if hard-coding "en-us" is actually the right approach or a genuine bug.Verify review comment: Hard-coded "en-us" affects English text tokenization in ZipVoice.
Based on verification:
Issue Confirmed: Different English accents (identified by BCP 47 language identifiers) produce different phonemic representations, so hard-coding "en-us" does affect tokenization output.
However, Suggested Fix is Incorrect: ZipVoice uses a zero-shot approach that requires prompt_wav (audio file) and prompt_transcription as inputs, not voice name parameters like "en-us" or "en-gb". The
voiceparameter passed toConvertTextToTokenIdsis architecturally incompatible with ZipVoice's design—simply substituting it for the hard-coded "en-us" would not work.Root Cause: The voice parameter in the ZipVoice frontend signature appears to be for interface consistency with other TTS systems (Kokoro, Matcha, VITS), but ZipVoice doesn't actually support voice name selection. The "en-us" hard-code may be a stub awaiting proper architectural integration.
Proper resolution requires determining the intended behavior: Should English tokenization detect the accent from metadata/context, use only "en-us" (current behavior), or be made user-configurable through a different mechanism? The suggested diff would not function as intended because the voice parameter is incompatible with ZipVoice's architecture.
Likely an incorrect or invalid review comment.
scripts/apk/generate-tts-apk-script.py (1)
35-51: New ZipVoice fields onTtsModellook consistentExtending
TtsModelwithvocoder,text_model,flow_matching_model,pinyin_dict, andis_zipvoiceis consistent with how other model families (matcha, kitten, kokoro) are represented. As long as the Jinja templates are updated to read these new fields only whenis_zipvoiceis true, this shape should work well.sherpa-onnx/java-api/Makefile (1)
50-52: Java API Makefile wiring for ZipVoice config looks correctAdding
OfflineTtsZipvoiceModelConfig.javaalongside the other OfflineTts* config classes ensures the new model config is compiled and included insherpa-onnx.jar. This matches the existing pattern in this Makefile.sherpa-onnx/csrc/offline-tts-zipvoice-impl.h (1)
89-95: Callback now correctly invoked for ZipVoice TTS resultsCapturing the
GeneratedAudioasans, invokingcallback(ans.samples.data(), ans.samples.size(), 1.0), and then returningansfixes the earlier issue where the callback was ignored for ZipVoice while maintaining the original return semantics.Just make sure all callback implementations treat the
samplespointer as valid only during the call (no storing it beyond the callback), which matches how other TTS paths usually behave..github/workflows/run-java-test.yaml (2)
324-324: LGTM! New Zipvoice test added.The addition of the Zipvoice TTS test script is properly integrated into the CI pipeline.
332-332: LGTM! Cleanup added for Zipvoice model.The cleanup step appropriately removes the downloaded Zipvoice model to save space, consistent with other model cleanup steps in the workflow.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (2)
45-45: LGTM! WaveReader import added.The import supports the new Zipvoice prompt-based generation path.
117-117: LGTM! Conditional speaker handling for Zipvoice.The logic appropriately returns 1 speaker for Zipvoice models, bypassing the multi-speaker UI when not applicable.
java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (1)
1-37: LGTM! Well-structured build and run script.The script follows best practices with:
- Conditional builds to skip already-completed steps
- Proper error handling via
set -ex- Correct library paths and classpath configuration
- Model download with appropriate checks
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt (1)
9-9: LGTM! WaveReader import added.The import supports the new Zipvoice prompt-based generation path.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (4)
47-49: LGTM! New Zipvoice model fields added.The new fields support the Zipvoice TTS model configuration.
59-59: LGTM! Zipvoice flag added.The public flag enables conditional Zipvoice behavior throughout the TTS engine.
79-84: LGTM! Zipvoice fields initialized.Proper initialization of Zipvoice-specific fields in the init block.
223-225: LGTM! Zipvoice configuration properly passed through.The new Zipvoice fields are correctly passed to the configuration builder.
Also applies to: 232-232
.github/workflows/apk-tts-engine.yaml (1)
25-26: Revert or clarify intent of temporary debug commit.The matrix reduction from 40 to 2 jobs is confirmed, but the commit message "drop me: build only zipvoice" indicates this is temporary debug work. This same commit also removes signing and publishing steps. Either revert this commit before merging or clarify whether this reduced matrix and disabled signing/publishing are intentional for the final state.
.github/workflows/apk-tts.yaml (1)
25-26: No action required—coverage is maintained by design.The build matrix reduction from 40 to 2 jobs does not impact coverage. The
generate-tts-apk-script.pyscript dynamically divides all 245 TTS models proportionally among the specified number of runners. Withtotal: 2andindex: [0, 1], each job processes ~122–123 models (245 ÷ 2), ensuring all models are built. The script validatesnum_per_runner > 0to prevent invalid configurations. This is a legitimate CI optimization—fewer parallel jobs, identical coverage.scripts/apk/build-apk-tts.sh.in (3)
52-62: LGTM: Conditional model cleanup logic.The conditional cleanup correctly removes unused ONNX variants based on the selected text_model, and skips vocoder download for zipvoice models.
159-159: Verify debug build for production.The build has been changed from
assembleReleasetoassembleDebug. Debug builds are not optimized and use debug signing, which may not be suitable for production APKs.Is this change intentional for testing, or should production builds use
assembleRelease?
25-30: Verify whether TTS builds intentionally support arm64-v8a only.The TTS build scripts (build-apk-tts.sh.in and build-apk-tts-engine.sh.in) restrict to arm64-v8a only, while all other build scripts in the repository support multi-architecture builds (arm64-v8a, armeabi-v7a, x86_64, x86). This appears intentional but is inconsistent.
Confirm whether this single-architecture restriction is required for zipvoice TTS integration or if multi-architecture support should be restored.
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
10-10: LGTM: Zipvoice configuration integration.The zipvoice field and corresponding getter/setter follow the established pattern for other TTS model configurations (vits, matcha, kokoro, kitten). The implementation is consistent and correct.
Also applies to: 20-20, 46-48, 55-55, 84-87
scripts/apk/build-apk-tts-engine.sh.in (2)
25-30: Verify single-architecture restriction.Similar to the TTS app build script, this engine build is also restricted to arm64-v8a only. Ensure this is intentional and not a temporary limitation.
165-165: Verify debug build for production.The build uses
assembleDebuginstead ofassembleRelease. Confirm this is intentional for the TTS engine APK.android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
143-143: isZipvoice flag is hardcoded to false.The
isZipvoiceflag is hardcoded tofalseon lines 143 and 204. This means the zipvoice generation path will never be executed in the current code.Based on the PR description and CI pattern comments, this appears to be intentional for CI substitution. Confirm this is the expected behavior.
Also applies to: 204-204
sherpa-onnx/jni/offline-tts.cc (3)
121-147: LGTM: Zipvoice configuration parsing.The zipvoice configuration parsing follows the established pattern used for other TTS models (vits, matcha, kokoro, kitten) and correctly reads all required fields.
273-301: LGTM: Prompt-based generation JNI implementation.The JNI method properly manages resource lifetimes:
- String UTF chars are released after use
- Float array elements are released with JNI_ABORT (correct for read-only access)
- Return array is properly constructed
The implementation is consistent with the existing
generateImplmethod.
365-432: LGTM: Callback-based prompt generation.The callback wrapper correctly:
- Captures the JNI environment and callback object
- Invokes the Kotlin/Java callback with proper type conversion
- Returns the callback result to control generation continuation
The resource management mirrors the non-callback variant and is correct.
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
1-140: LGTM: Zipvoice model configuration implementation.The builder pattern implementation is consistent with other model configs (Vits, Matcha, Kokoro, Kitten). Default values match the Kotlin implementation and appear reasonable for the zipvoice model.
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
33-46: LGTM: generateWithPrompt overloads.The overload chain correctly provides default values (speed=1.0f, num_step=4) and follows the pattern established by the existing
generatemethods.sherpa-onnx/kotlin-api/Tts.kt (4)
47-58: LGTM: Zipvoice model configuration.The data class definition matches the Java implementation and includes appropriate default values for all parameters.
123-136: LGTM: Prompt-based generation methods.The new methods properly delegate to native implementations and construct GeneratedAudio from the returned array. The API is clean and consistent with existing methods.
Also applies to: 157-171
297-299: Higher thread count for zipvoice models.Zipvoice models use 8 threads compared to 4 for Kokoro/Kitten and 2 for other models. Ensure this higher thread count is intentional and has been performance-tested.
Higher thread counts can improve throughput but may increase memory usage and context switching overhead on lower-end devices.
369-380: LGTM: Zipvoice configuration creation.The zipvoice configuration is created conditionally and follows the same pattern as other model types. File paths are correctly constructed relative to modelDir.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt
Show resolved
Hide resolved
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt
Outdated
Show resolved
Hide resolved
9f24b71 to
199bd99
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (7)
scripts/apk/generate-tts-apk-script.py (1)
589-591: Critical: all_model_list overwrite discards all non-Zipvoice models.Line 591 overwrites the accumulated model list with only Zipvoice models, discarding all VITS, Piper, Mimic3, Coqui, Matcha, Kokoro, and Kitten models. This was already flagged in previous reviews and should be removed or moved to a separate testing configuration.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (2)
59-59: isZipvoice should have restricted visibility.The isZipvoice property is public and mutable, allowing modification from anywhere. This was already flagged in a previous review suggesting it should be internal to prevent unintended changes.
191-199: Incorrect comment in Example 12.Line 199's comment says "isKitten = true" but this example is for Zipvoice and should say "isZipvoice = true". This was already flagged in previous reviews.
scripts/apk/build-apk-tts-engine.sh.in (1)
168-168: Wildcard APK path could match multiple files.The wildcard pattern
android/SherpaOnnxTtsEngine/app/build/outputs/apk/*/*.apkcould match multiple APK files (debug/release, different flavors). This was already flagged in a previous review as potentially problematic.scripts/apk/build-apk-tts.sh.in (1)
141-145: Tighten APK selection instead of using a broad wildcard.
mv android/SherpaOnnxTts/app/build/outputs/apk/*/*.apk ./apks/...apkcan match multiple APKs (e.g., multiple flavors or splits). With more than one match,mvto a single filename will fail.Prefer selecting a single APK explicitly (e.g., the debug variant path or latest file via
find/sorting) or fail fast if >1 candidate is found.Also applies to: 159-162
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
140-151: WireisZipvoicethrough activity state instead of a local flag.Here
var isZipvoice = falseis a local variable, so the zipvoice branch only becomes active if an external sed-based script rewrites this line. For anyone configuring Example 12 manually ininitTts(), it’s easy to setisZipvoice = truethere but forget to update this local copy, leaving the prompt-based path unused.Consider promoting
isZipvoiceto aprivate varonMainActivity, setting it ininitTts()based on the chosen model, and referencing that field inonClickGenerate(). That keeps configuration and runtime behavior in sync and avoids sed depending on multiple identical literals.Also, the hardcoded
prompt_textand prompt asset path are duplicated across the app and engine modules; factoring them into a shared helper or constant would reduce maintenance overhead and aligns with the TODO in the PR description.sherpa-onnx/kotlin-api/Tts.kt (1)
307-317: Still missing Kotlin-side validation for Zipvoice-specific model paths.When
isZipvoiceis true, the earlier check now skips the “Please specify a TTS model” guard, but there is no corresponding validation thattextModel,flowMatchingModel, andvocoderare non-empty before constructingOfflineTtsZipvoiceModelConfig. That allows building configs with empty paths (e.g.,"$modelDir/"), pushing errors down into the native/C++ layer instead of failing fast on the Kotlin side, which was already called out in a previous review.Consider adding explicit validation, e.g.:
if (acousticModelName.isNotEmpty() && vocoder.isEmpty()) { throw IllegalArgumentException("Please provide vocoder for Matcha TTS") } + + if (isZipvoice) { + if (textModel.isEmpty()) { + throw IllegalArgumentException("Please provide textModel for Zipvoice TTS") + } + if (flowMatchingModel.isEmpty()) { + throw IllegalArgumentException("Please provide flowMatchingModel for Zipvoice TTS") + } + if (vocoder.isEmpty()) { + throw IllegalArgumentException("Please provide vocoder for Zipvoice TTS") + } + }This keeps the API behavior symmetric with other TTS modes and avoids confusing runtime failures when paths are missing.
Also applies to: 369-380
🧹 Nitpick comments (6)
scripts/apk/build-apk-tts.sh.in (2)
52-60: Make zipvoice cleanup more robust underset -e.The
rmcommands inside theis_zipvoiceblock will abort the whole script if any of the expected ONNX files are missing (because ofset -e). That’s fragile if model packaging ever changes.Consider using
rm -f(and/or checking for file existence) so adding/removing variants in the tarball doesn’t break APK builds.
88-99: Keep MainActivity defaults in sync with sed patterns.The new
sedinjections fortextModel,flowMatchingModel,pinyinDict, andisZipvoicerely on the exact literal defaults (textModel = null,flowMatchingModel = null,pinyinDict = null,isZipvoice = false) being present inMainActivity.kt.If those defaults change (e.g., refactoring to a property or different initializer), these substitutions will silently stop working. If you keep this templating approach, document the dependency near the Kotlin fields or centralize the placeholders to avoid breakage during refactors.
Also applies to: 126-128
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
3-60: Zipvoice Java example is coherent and correctly wired.The example configures
OfflineTtsZipvoiceModelConfig, usesgenerateWithPrompt(...), and saves output with sensible logging. Class name and output filename now match the file and bilingual content.If you later expose
steps/threadsas per the PR TODOs, you might add an overload here to demonstrate those knobs, but the current sample is fine as-is.android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
192-205: Zipvoice config wiring looks correct but is tightly coupled to comments/templates.The additions for
textModel,flowMatchingModel,pinyinDict, andisZipvoiceininitTts()and thegetOfflineTtsConfig(...)call are consistent with the new zipvoice model config (Java/Kotlin).Just be aware that:
- Example 12’s commented values now serve as the only in-code documentation for how to configure zipvoice on Android.
- The build script’s sed-based replacement relies on these variable names and their initial values.
If you plan to support more zipvoice variants, it may be worth extracting a small helper or data class for “selected TTS model + flags” instead of growing this initializer further.
Also applies to: 223-229, 314-323, 329-345
sherpa-onnx/kotlin-api/Tts.kt (2)
47-58: Zipvoice config wiring looks consistent with other platforms; tuning fields are currently fixed.The
OfflineTtsZipvoiceModelConfigdata class and its use ingetOfflineTtsConfiglook coherent: required paths (textModel, flowMatchingModel, vocoder, pinyinDict, tokens, dataDir) are passed through, and defaults forfeatScale,tShift,targetRms, andguidanceScalematch the Java/Go shapes. If you want full cross-language parity later, consider adding parameters (or a builder-style API) on the Kotlin side to let callers override those tuning fields instead of hard-coding the defaults here.Also applies to: 65-66, 369-380, 388-389
123-136: Prompt-based generate APIs are consistent with existing methods; consider minor polish.The new
generateWithPrompt/generateWithPromptWithCallbackmethods mirror the existinggenerate/generateWithCallbackpatterns and wire cleanly to the new native entry points (same parameter order and return handling). From an API standpoint, you might optionally:
- Use Kotlin-style camelCase names (
promptText,promptSamples,sampleRate,numStep) on the public methods while keeping the JNI parameter names as-is; and/or- Add lightweight preconditions for obviously invalid values (e.g.,
require(sample_rate > 0)andrequire(num_step > 0)) if you want errors earlier than the native layer.These are cosmetic/ergonomic tweaks and not blockers.
Also applies to: 157-171, 223-231, 241-250
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
.github/workflows/apk-tts-engine.yaml(1 hunks).github/workflows/apk-tts.yaml(1 hunks).github/workflows/run-java-test.yaml(1 hunks).gitignore(1 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt(6 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt(1 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt(3 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt(5 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt(2 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt(1 hunks)java-api-examples/NonStreamingTtsZipvoiceZhEn.java(1 hunks)java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh(1 hunks)scripts/apk/build-apk-tts-engine.sh.in(7 hunks)scripts/apk/build-apk-tts.sh.in(6 hunks)scripts/apk/generate-tts-apk-script.py(4 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc(1 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-impl.h(1 hunks)sherpa-onnx/java-api/Makefile(1 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java(3 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java(5 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java(1 hunks)sherpa-onnx/jni/offline-tts.cc(3 hunks)sherpa-onnx/kotlin-api/Tts.kt(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- sherpa-onnx/csrc/offline-tts-zipvoice-impl.h
- .gitignore
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt
- sherpa-onnx/java-api/Makefile
- sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt
- sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt
- android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
java-api-examples/run-non-streaming-tts-zipvoice-zh-en.shscripts/apk/build-apk-tts-engine.sh.insherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
java-api-examples/run-non-streaming-tts-zipvoice-zh-en.shsherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (6)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
generateWithPrompt(123-136)generateWithPromptImpl(223-231)samples(79-91)sampleRate(107-107)generateWithPromptWithCallback(157-171)generateWithPromptWithCallbackImpl(241-250)generateWithCallbackImpl(233-239)
sherpa-onnx/kotlin-api/Tts.kt (3)
scripts/go/sherpa_onnx.go (7)
OfflineTtsVitsModelConfig(921-930)OfflineTtsMatchaModelConfig(932-941)OfflineTtsKokoroModelConfig(943-952)OfflineTtsKittenModelConfig(954-960)OfflineTtsZipvoiceModelConfig(962-974)GeneratedAudio(1001-1006)OfflineTtsModelConfig(976-991)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
OfflineTtsZipvoiceModelConfig(3-140)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
OfflineTtsModelConfig(5-104)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
LibraryUtils(43-246)
sherpa-onnx/jni/offline-tts.cc (3)
sherpa-onnx/kotlin-api/Tts.kt (2)
tokens(47-58)samples(79-91)sherpa-onnx/jni/jni.cc (2)
NewInteger(17-21)NewInteger(17-17)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
callback(98-106)tts(22-399)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
TtsModel(28-36)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (4)
sherpa-onnx/kotlin-api/Tts.kt (1)
tokens(47-58)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
Builder(50-103)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsVitsModelConfig.java (1)
OfflineTtsVitsModelConfig(5-116)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsKokoroModelConfig.java (1)
OfflineTtsKokoroModelConfig(4-104)
🔇 Additional comments (27)
.github/workflows/run-java-test.yaml (2)
324-324: LGTM! Zipvoice test added correctly.The new test script invocation follows the established pattern for other TTS model tests in this workflow.
332-332: Appropriate cleanup for zipvoice artifacts.The cleanup step correctly removes the downloaded zipvoice model directory to save disk space, consistent with other model cleanup steps in the workflow.
scripts/apk/generate-tts-apk-script.py (2)
35-51: LGTM! Zipvoice fields added to TtsModel.The new fields (text_model, flow_matching_model, pinyin_dict, is_zipvoice) appropriately extend the TtsModel dataclass to support Zipvoice-specific configuration.
547-572: Zipvoice model factory function looks good.The get_zipvoice_models() function correctly constructs Zipvoice model configurations with appropriate field initialization and data_dir setup.
sherpa-onnx/jni/offline-tts.cc (3)
122-147: LGTM! Zipvoice configuration parsing implemented correctly.The JNI code properly reads all Zipvoice-specific fields (tokens, textModel, flowMatchingModel, vocoder, dataDir, pinyinDict, and float parameters) using the standard SHERPA_ONNX_JNI_READ_* macros, consistent with other model configurations in this file.
273-301: Proper JNI resource management in generateWithPromptImpl.The implementation correctly:
- Releases JNI strings with ReleaseStringUTFChars for both text and prompt_text
- Releases float array elements with ReleaseFloatArrayElements using JNI_ABORT (no need to copy back)
- Constructs the return object array with audio samples and sample rate
365-432: Callback wrapper and resource management look correct.The generateWithPromptWithCallbackImpl function properly:
- Creates a C++ callback wrapper that invokes the Java callback method
- Handles resource cleanup for strings and float arrays
- Returns audio samples and sample rate in the expected format
The callback pattern matches the existing generateWithCallbackImpl implementation.
.github/workflows/apk-tts.yaml (1)
25-26: Verify: Matrix reduced from 40 to 2 jobs - is this intentional?Same as in apk-tts-engine.yaml, the build matrix has been reduced from 40 to 2 jobs. This change should be coordinated with the model list modifications in generate-tts-apk-script.py. If you intend to build APKs for all TTS families (not just Zipvoice), you'll need more than 2 jobs.
java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh (4)
5-20: JNI build logic is correct.The conditional build checks for both macOS (.dylib) and Linux (.so) library files and builds only if neither exists. The CMake configuration and build steps match other test scripts in the repository.
22-26: Java API build check is correct.The script properly checks for the Java API jar file and builds it only if missing, avoiding unnecessary rebuilds.
28-32: Model download and extraction handled correctly.The script checks for a specific model file (fm_decoder.onnx) to determine if the model has already been downloaded, then downloads, extracts, and cleans up the archive appropriately.
34-37: Java execution command is correct.The java command properly sets the library path and classpath to run the NonStreamingTtsZipvoiceZhEn test.
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt (3)
47-49: LGTM! Zipvoice-specific fields added.The private fields (textModel, flowMatchingModel, pinyinDict) appropriately extend TtsEngine to support Zipvoice configuration.
79-84: Proper initialization of Zipvoice fields.The Zipvoice-specific fields are correctly initialized to null in the init block, consistent with other optional model configurations.
223-232: Zipvoice configuration properly passed to OfflineTts.The getOfflineTtsConfig call correctly includes all Zipvoice-specific parameters (textModel, flowMatchingModel, pinyinDict, isZipvoice) alongside existing TTS configuration fields.
scripts/apk/build-apk-tts-engine.sh.in (8)
25-30: Verify: Multi-arch builds disabled - is this intentional?The script now only builds for arm64-v8a, with armv7-eabi, x86-64, and x86 builds commented out. This dramatically reduces APK variants from 4 architectures to 1. If this is temporary for Zipvoice testing, please add a comment. Otherwise, this will break support for 32-bit ARM devices and x86 emulators/devices.
43-45: Zipvoice model fields properly extracted.The script correctly extracts text_model, flow_matching_model, and pinyin_dict from the TtsModel template variables.
54-60: Smart cleanup for Zipvoice model variants.The conditional cleanup removes unused int8/fp32 model variants based on which text_model is selected, reducing APK size. This is a good optimization.
62-64: Correct vocoder fetch logic for Zipvoice.The script properly skips vocoder download for Zipvoice models (which bundle their own vocoder) while still downloading vocos for Matcha models. The condition
not tts_model.is_zipvoicecorrectly distinguishes the two cases.
82-84: isZipvoice flag properly set via sed.The Jinja2 conditional correctly updates TtsEngine.kt to set isZipvoice = true when processing Zipvoice models.
102-112: Zipvoice model configuration correctly injected.The sed commands properly update TtsEngine.kt with textModel, flowMatchingModel, and pinyinDict values when these fields are present in the TtsModel.
147-147: Single-arch build loop matches disabled multi-arch builds.The for loop now only iterates over arm64-v8a, consistent with the commented-out multi-arch build steps. See earlier comment about verifying if this is intentional.
165-165: Verify: assembleDebug instead of assembleRelease?The build now uses
assembleDebuginstead ofassembleRelease. Debug builds:
- Are larger (no ProGuard/R8 optimization)
- Include debugging symbols
- Are not signed with release keys
- May have different performance characteristics
If this is temporary for testing, please add a comment. For production APKs, you should use assembleRelease.
.github/workflows/apk-tts-engine.yaml (1)
25-26: Matrix reduction to 2 jobs is intentional and correctly configured for Zipvoice testingThe verification confirms this change is deliberate. The script aggregates all 245 models across 8 TTS families (lines 582–589), but line 591 explicitly overrides
all_model_listwith only the 2 Zipvoice models. The workflow matrixtotal: ["2"]withindex: ["0", "1"]is correctly sized for this testing configuration. When all families need to be restored, both the line 591 override and the workflow matrix will need to be updated.sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
3-140: Zipvoice model config class is consistent with existing config types.The immutable fields, builder, defaults, and accessor names align with the other
OfflineTts*ModelConfigclasses and the KotlinOfflineTtsZipvoiceModelConfig. This should integrate cleanly withOfflineTtsModelConfigand the JNI layer.sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
33-47: Prompt-based OfflineTts APIs are well-structured and align with existing patterns.The new
generateWithPromptandgenerateWithPromptWithCallbackoverloads mirror the existinggenerate*andgenerateWithCallbackpatterns, with sensible defaults forspeedandnum_stepand clean wrapping of the nativeObject[]return intoGeneratedAudio. The native declarations also match the usage sites.No issues from the Java side.
Also applies to: 63-76, 99-104
sherpa-onnx/kotlin-api/Tts.kt (1)
262-279: The review comment concern does not apply—both call sites use named arguments, which are position-independent in Kotlin.Both call sites in the codebase (TtsEngine.kt:217 and MainActivity.kt:329) use named arguments when calling
getOfflineTtsConfig(). In Kotlin, named arguments are position-independent, so changing the parameter order in the function definition does not break compatibility. All new parameters (textModel,flowMatchingModel,pinyinDict,isZipvoice) have been correctly provided in both calls, and optional parameters use their defaults.Likely an incorrect or invalid review comment.
199bd99 to
85f45f8
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (6)
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (2)
203-208: Hardcoded prompt and indentation issues flagged in previous review.This code segment has already been reviewed with feedback about hardcoded prompt text, hardcoded asset path, and inconsistent indentation (tabs vs spaces).
206-208: Add error handling for WaveReader.readWave to prevent crashes.
WaveReader.readWavecould fail if the asset doesn't exist or is malformed, which would cause a crash when accessingreader.samplesorreader.sampleRateon line 208. Wrap the asset loading in a try-catch block and provide a graceful fallback or user notification.- val reader = WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav") - - TtsEngine.tts!!.generateWithPromptWithCallback(testText, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ::callback)} else + val reader = try { + WaveReader.readWave(application.assets, "sherpa-onnx-zipvoice-distill-zh-en-emilia/prompt.wav") + } catch (e: Exception) { + Log.e(TAG, "Failed to load prompt audio: ${e.message}") + withContext(Dispatchers.Main) { + Toast.makeText( + applicationContext, + "Failed to load prompt audio", + Toast.LENGTH_SHORT + ).show() + startEnabled = true + } + return@launch + } + TtsEngine.tts!!.generateWithPromptWithCallback(testText, prompt_text, reader.samples, reader.sampleRate, TtsEngine.speed, callback = ::callback) + } elsescripts/apk/build-apk-tts.sh.in (1)
141-165: Two ZipVoice variants generate identically named APKs, and wildcardmvis still fragileThere are two ZipVoice
TtsModelentries with the samemodel_dirandlang, but different{text,flow}pairs. When both flow through this script:mv android/SherpaOnnxTts/app/build/outputs/apk/*/*.apk \ ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-$model_dir.apk
- Both builds target the same destination filename, so the second run overwrites the first; you end up with only one ZipVoice APK per arch even though two models are defined.
- The broad
*/*.apkwildcard can still match multiple APKs (e.g., multiple flavors or debug+other variants), which can makemvfail or behave unexpectedly — this was already raised in a previous review.Consider:
- Adding a variant suffix for ZipVoice based on
text_model(e.g.-fp16vs-int8) so filenames are unique, and- Narrowing the source glob or explicitly selecting a single APK (e.g. by flavor or using a small bash check to ensure exactly one match) before moving.
For example (conceptually):
variant_suffix="" {% if tts_model.is_zipvoice and tts_model.text_model == "text_encoder_int8.onnx" %} variant_suffix="-int8" {% endif %} mv android/SherpaOnnxTts/app/build/outputs/apk/debug/*.apk \ ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-$model_dir${variant_suffix}.apkscripts/apk/build-apk-tts-engine.sh.in (1)
147-171: Two ZipVoice engine variants share an APK name, and wildcardmvremains brittleFor engine builds you again have two ZipVoice models with the same
model_dirandlang. Themvcommand:mv android/SherpaOnnxTtsEngine/app/build/outputs/apk/*/*.apk \ ./apks/sherpa-onnx-${SHERPA_ONNX_VERSION}-$arch-$lang-tts-engine-$model_dir.apk
- Produces the same destination filename for both runs, so the second ZipVoice engine APK overwrites the first; you only retain one variant.
- Uses a broad
*/*.apkglob, which can match multiple APKs and was already flagged as a potential issue in earlier reviews.It would be safer to:
- Include a variant suffix for ZipVoice based on the model (e.g.
-int8vs-fp16) so filenames are unique, and- Narrow the glob to a specific variant directory (e.g.
debug) or explicitly select a single APK before moving.Also applies to: 168-168
scripts/apk/generate-tts-apk-script.py (1)
547-572: ZipVoice model wiring is correct, but overwritingall_model_listchanges script semantics
get_zipvoice_models()defines two variants of the samemodel_dirwith different{text, flow_matching}_modelvalues and marks themis_zipvoice = True; that matches how the APK templates consume ZipVoice-specific fields.- After aggregating VITS/Piper/Mimic3/Coqui/Matcha/Kokoro/Kitten models,
all_model_listis first extended with ZipVoice, then immediately reset toget_zipvoice_models(). This discards all previously-added models and makes the script “ZipVoice-only”.This is consistent with the current CI setup (matrix total=2; two ZipVoice models), but:
- Anyone expecting this script to generate APKs for all models will now only see ZipVoice.
- If
--totalis later increased without adding more ZipVoice entries, sharding will fail (num_per_runner <= 0).If the intent is temporary ZipVoice-only builds, consider:
- Removing the overwrite and leaving the aggregated list; or
- Guarding it behind a flag (e.g.
--zipvoice-only), plus a comment; and- Optionally asserting
total <= len(get_zipvoice_models())when in ZipVoice-only mode.Also applies to: 582-592
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
143-150: Zipvoice branch inonClickGeneratedepends on a local flag; consider promotingisZipvoiceto shared state.Here
isZipvoiceis a local variable initialized tofalse, and the Zipvoice path (prompt-basedgenerateWithPromptWithCallbackplusWaveReader.readWave) is only taken when it’strue. Separately,initTts()has its own localisZipvoiceused when building the config.That split makes it easy for callers (or future edits/scripts) to flip Zipvoice on in
initTts()but forget to do the same inonClickGenerate, leading to a Zipvoice-configured model still going through the non-promptgenerateWithCallbackbranch.To avoid this divergence, consider:
- Defining
private var isZipvoice: Boolean = falseas aMainActivityproperty.- Setting it in
initTts()alongside the Zipvoice model selection.- Using that same property in
onClickGenerateinstead of a new localvar isZipvoice.This keeps model selection and runtime path selection in sync.
🧹 Nitpick comments (11)
.github/workflows/apk-tts.yaml (1)
21-27: Matrixtotalis now tightly coupled to the number of ZipVoice modelsWith
total: ["2"]andindex: ["0","1"],generate-tts-apk-script.pywill shard assuming exactly 2 models (after the ZipVoice-only overwrite). If you later add/remove ZipVoice variants or re-enable other model families, this hard-codedtotalcan causenum_per_runner <= 0or uneven sharding.Consider either:
- Deriving
totalfrom the model list (or assertingtotal <= len(get_zipvoice_models())), or- Adding a short comment in the workflow noting that
totalmust match the ZipVoice model count.scripts/apk/generate-tts-apk-script.py (1)
31-52: New ZipVoice fields look consistent; consider documenting their useThe added fields (
vocoder,text_model,flow_matching_model,pinyin_dict,is_zipvoice) integrate cleanly with existing code thanks to defaults. They’re only consumed where needed (APK templates), so they don’t break older models.You might want to:
- Add a brief class-level docstring or comment explaining which backends use each field (VITS/Matcha/Kokoro/ZipVoice), to avoid future confusion.
- Optionally align this
TtsModelshape with the Flutter generator’sTtsModel(scripts/flutter/generate-tts.py) to keep configs in sync..github/workflows/apk-tts-engine.yaml (1)
21-27: Engine workflow now assumes exactly 2 models; keep in sync with generatorLike
apk-tts.yaml, this matrix (total: ["2"],index: ["0","1"]) assumesgenerate-tts-apk-script.pyis operating over exactly 2 models (currently the two ZipVoice entries). If you later change the model set or revert to all models, remember to updatetotalor make the script derive/check it, otherwise sharding will break.The addition of
fetch-depth: 0is appropriate for versioning.Also applies to: 29-32
scripts/apk/build-apk-tts.sh.in (2)
23-30: Single‑archarm64-v8a+assembleDebugis consistent but narrows output scopeYou’ve effectively reduced the build to:
- Only
arm64-v8a(other ABIs commented out), and- Debug builds via
./gradlew assembleDebug.That’s good for faster CI and unsigned artifacts, but it means:
- No multi‑ABI coverage from this script.
- Produced APKs are debug-only (no release configuration, ProGuard/minify, etc.).
If you intend these artifacts purely for testing, this is fine; otherwise, consider documenting that this path is “debug/arm64-only” or leaving a TODO to reintroduce release or additional ABIs later.
Also applies to: 141-155, 159-160
36-46: ZipVoice asset handling and vocoder download guard look reasonable
- The new
text_model,flow_matching_model, andpinyin_dictvariables and correspondingsedupdates correctly propagate ZipVoice config intoMainActivity.ktonly when non-empty.- The
is_zipvoiceflag and conditional vocoder download{% if tts_model.vocoder and not tts_model.is_zipvoice %} curl ... {% endif %}prevent downloading an external vocoder when ZipVoice bundles its own, which avoids redundant assets.
The
rmblock that deletes the unused{text,flow}pair for ZipVoice variants is logically correct, but tightly couples the script to exact filenames. If you expect these filenames to evolve, you might want to either:
- Switch to
rm -fto avoid hard failures when variants change, or- Keep it strict but add a short comment noting that the ZipVoice tarball must always contain both int8 and non‑int8 pairs.
Also applies to: 52-61, 88-99, 126-128
scripts/apk/build-apk-tts-engine.sh.in (1)
23-30: Engine build is now arm64‑only and debug‑onlyAs with the TTS APK script, this one now:
- Builds only
arm64-v8a, with other ABIs commented out.- Uses
./gradlew assembleDebugrather than a release target.That’s fine if these engine APKs are intended for CI/testing only. If you later want release or multi‑ABI coverage, it may be worth adding a brief comment or TODO here to avoid confusion.
Also applies to: 147-156, 163-166
sherpa-onnx/kotlin-api/Tts.kt (2)
47-58: Zipvoice model config wiring is consistent; advanced params currently fixed to defaults.The new
OfflineTtsZipvoiceModelConfigplus thezipvoicefield inOfflineTtsModelConfigand its construction ingetOfflineTtsConfiglook consistent with the Java/Go structs (tokens, textModel, flowMatchingModel, vocoder, dataDir, pinyinDict, featScale, tShift, targetRms, guidanceScale). Right now,getOfflineTtsConfigonly sets paths and relies on the default numeric values; that’s fine, but if you later need to tunefeatScale/tShift/targetRms/guidanceScalefrom Android, you’ll need to thread those through this helper as well.Also applies to: 65-69, 381-392, 395-401
259-279: Zipvoice validation covers required fields; consider enforcing “one active model type” for configs.The updated validation (model presence check relaxed for
isZipvoice, plus checks forvocoder,textModel,flowMatchingModel, andpinyinDictwhenisZipvoiceis true) aligns well with the Zipvoice requirements and avoids late failures in native code.One remaining edge case is that callers can now accidentally enable Zipvoice and another model type (e.g., VITS via
modelName, Matcha viaacousticModelName, or Kokoro/Kitten viavoices) at the same time. To make misconfiguration fail fast, consider adding a small guard like:val enabledModelTypes = listOf( modelName.isNotEmpty(), // VITS acousticModelName.isNotEmpty(), // Matcha voices.isNotEmpty(), // Kokoro/Kitten isZipvoice // Zipvoice ).count { it } if (enabledModelTypes == 0) { throw IllegalArgumentException("Please specify exactly one TTS model type") } if (enabledModelTypes > 1) { throw IllegalArgumentException("Please specify only one TTS model type (VITS, Matcha, Kokoro/Kitten, or Zipvoice)") }This would prevent ambiguous configs while keeping the Zipvoice path clean.
Also applies to: 289-305, 307-329
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
1-60: Example wiring for Zipvoice looks correct; optionally guard RTF math for empty output.The example correctly:
- Builds
OfflineTtsZipvoiceModelConfigwith paths for text model, flow-matching model, vocoder, pinyin dict, tokens, andespeak-ng-data.- Wraps it in
OfflineTtsModelConfigandOfflineTtsConfig.- Uses
WaveReaderfor the prompt WAV and callsgenerateWithPrompt(text, prompt_text, samples, sampleRate).One small robustness tweak: if generation fails and returns zero samples,
audioDurationbecomes 0 andreal_time_factorturns intoInfinity/NaN. You could defensively handle that case:float audioDuration = audio.getSamples().length / (float) audio.getSampleRate(); float real_time_factor = audioDuration > 0 ? timeElapsedSeconds / audioDuration : 0.0f;This keeps the example stable even in failure modes.
android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (1)
189-205: Zipvoice config variables and Example 12 comments line up with Kotlin helper; minor naming nit in comment.The added Zipvoice variables (
textModel,flowMatchingModel,pinyinDict,isZipvoice) and their use in thegetOfflineTtsConfigcall (textModel = textModel ?: "", etc., plusisZipvoice = isZipvoice) correctly plumb the Android-side settings into the Kotlin helper and onward intoOfflineTtsZipvoiceModelConfig.The Example 12 comment block is now consistent with the Zipvoice path (
// isZipvoice = true). One tiny nit: the comment line// pinyinDict = "pinyinDict"is a bit generic and doesn’t match the actual example path used elsewhere (pinyin.raw); if you want to reduce confusion for users copying these snippets, you might update it to the concrete filename you expect (e.g.,"pinyin.raw").Also applies to: 223-229, 314-323, 329-345
sherpa-onnx/jni/offline-tts.cc (1)
377-412: Extract duplicated callback wrapper logic.The callback wrapper logic here (lines 377-412) is nearly identical to the code in
generateWithCallbackImpl(lines 310-345). This duplication reduces maintainability and increases the risk of inconsistent bug fixes.Consider extracting the common callback wrapper creation into a helper function:
// Add helper function in the file static std::function<int32_t(const float *, int32_t, float)> CreateCallbackWrapper(JNIEnv *env, jobject callback) { return [env, callback](const float *samples, int32_t n, float /*progress*/) -> int { jclass cls = env->GetObjectClass(callback); jmethodID mid = env->GetMethodID(cls, "invoke", "([F)Ljava/lang/Integer;"); if (mid == nullptr) { SHERPA_ONNX_LOGE("Failed to get the callback. Ignore it."); return 1; } jfloatArray samples_arr = env->NewFloatArray(n); env->SetFloatArrayRegion(samples_arr, 0, n, samples); jobject should_continue = env->CallObjectMethod(callback, mid, samples_arr); jclass jklass = env->GetObjectClass(should_continue); jmethodID int_value_mid = env->GetMethodID(jklass, "intValue", "()I"); return env->CallIntMethod(should_continue, int_value_mid); }; }Then use it in both functions:
auto callback_wrapper = CreateCallbackWrapper(env, callback);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
.github/workflows/apk-tts-engine.yaml(1 hunks).github/workflows/apk-tts.yaml(1 hunks).github/workflows/run-java-test.yaml(1 hunks).gitignore(1 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt(6 hunks)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt(1 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt(3 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt(5 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt(2 hunks)android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt(1 hunks)java-api-examples/NonStreamingTtsZipvoiceZhEn.java(1 hunks)java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh(1 hunks)scripts/apk/build-apk-tts-engine.sh.in(7 hunks)scripts/apk/build-apk-tts.sh.in(6 hunks)scripts/apk/generate-tts-apk-script.py(4 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-frontend.cc(1 hunks)sherpa-onnx/csrc/offline-tts-zipvoice-impl.h(1 hunks)sherpa-onnx/java-api/Makefile(1 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java(3 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java(5 hunks)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java(1 hunks)sherpa-onnx/jni/offline-tts.cc(3 hunks)sherpa-onnx/kotlin-api/Tts.kt(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- sherpa-onnx/csrc/offline-tts-zipvoice-impl.h
- java-api-examples/run-non-streaming-tts-zipvoice-zh-en.sh
- .github/workflows/run-java-test.yaml
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsEngine.kt
- android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/WaveReader.kt
- sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/WaveReader.kt
- android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/TtsService.kt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-06T04:18:47.981Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
scripts/apk/build-apk-tts-engine.sh.inscripts/apk/build-apk-tts.sh.insherpa-onnx/jni/offline-tts.cc
📚 Learning: 2025-08-06T04:23:50.237Z
Learnt from: litongjava
Repo: k2-fsa/sherpa-onnx PR: 2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
.gitignoresherpa-onnx/jni/offline-tts.cc
🧬 Code graph analysis (6)
java-api-examples/NonStreamingTtsZipvoiceZhEn.java (1)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/LibraryUtils.java (1)
LibraryUtils(43-246)
sherpa-onnx/kotlin-api/Tts.kt (4)
scripts/go/sherpa_onnx.go (7)
OfflineTtsVitsModelConfig(921-930)OfflineTtsMatchaModelConfig(932-941)OfflineTtsKokoroModelConfig(943-952)OfflineTtsKittenModelConfig(954-960)OfflineTtsZipvoiceModelConfig(962-974)GeneratedAudio(1001-1006)OfflineTtsModelConfig(976-991)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (1)
OfflineTtsZipvoiceModelConfig(3-140)android/SherpaOnnxAar/sherpa_onnx/src/main/java/com/k2fsa/sherpa/onnx/Tts.kt (2)
generateWithPromptImpl(223-231)generateWithPromptWithCallbackImpl(241-250)sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
OfflineTtsModelConfig(5-104)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (1)
sherpa-onnx/kotlin-api/Tts.kt (7)
generateWithPrompt(123-136)generateWithPromptImpl(223-231)samples(79-91)sampleRate(107-107)generateWithPromptWithCallback(157-171)generateWithPromptWithCallbackImpl(241-250)generateWithCallbackImpl(233-239)
sherpa-onnx/jni/offline-tts.cc (2)
sherpa-onnx/kotlin-api/Tts.kt (6)
model(6-15)model(28-37)model(39-45)model(71-77)tokens(47-58)samples(79-91)android/SherpaOnnxTts/app/src/main/java/com/k2fsa/sherpa/onnx/MainActivity.kt (2)
callback(98-106)tts(22-399)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (2)
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsZipvoiceModelConfig.java (2)
Builder(73-139)OfflineTtsZipvoiceModelConfig(3-140)sherpa-onnx/kotlin-api/Tts.kt (1)
vits(60-69)
scripts/apk/generate-tts-apk-script.py (1)
scripts/flutter/generate-tts.py (1)
TtsModel(28-36)
🔇 Additional comments (10)
.gitignore (1)
161-161: LGTM!The addition correctly ignores the new ZipVoice model artifact directory and follows the existing naming convention for sherpa-onnx model artifacts in the
.gitignorefile.android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt (1)
117-117: ****The hardcoding of
numSpeakers = 1for ZipVoice is correct. ZipVoice uses a different synthesis approach (generateWithPromptWithCallback) that relies on prompt-based voice cloning rather than speaker ID selection. Unlike traditional multi-speaker TTS models (VITS, Matcha), ZipVoice doesn't support multiple speakers and instead uses prompt audio samples for voice cloning, making the hardcoded value of 1 the appropriate design choice to suppress the speaker ID selection UI.scripts/apk/build-apk-tts-engine.sh.in (1)
36-49: ZipVoice + ISO‑639 wiring intoTtsEngine.ktlooks consistent
lang_iso_639_3/_2are injected intoTtsEngine.ktviasedand are populated in Python byconvert_lang_to_iso_639_3, so languages for ZipVoice and others are correctly normalized.- New fields (
text_model,flow_matching_model,pinyin_dict,isZipvoice) are only written when present, which keeps non‑ZipVoice engines unchanged.- The vocoder download is correctly skipped when
is_zipvoiceis true, relying on the bundled ZipVoice assets instead.This all lines up with the generator changes and shouldn’t affect existing engines.
Also applies to: 69-80, 82-113
sherpa-onnx/kotlin-api/Tts.kt (1)
123-137: Prompt-based generate APIs mirror existing patterns and JNI signatures.
generateWithPromptandgenerateWithPromptWithCallbackfollow the same pattern as the existinggenerate/generateWithCallbackwrappers: they call the corresponding external functions, expect a 2-elementArray<Any>(FloatArraysamples +Intsample rate), and wrap it inGeneratedAudio. The external declarations match the AndroidTts.ktsnippet (ptr, text, prompt_text,FloatArrayprompt_samples,Intsample_rate,Floatspeed,Intnum_step), so the Kotlin surface looks coherent for both desktop and Android Zipvoice usage.Also applies to: 157-171, 223-232, 241-250
sherpa-onnx/java-api/Makefile (1)
50-52: IncludingOfflineTtsZipvoiceModelConfig.javain the jar build is correct.Adding the Zipvoice config class to
java_filesensures it’s compiled and present insherpa-onnx.jar, matching the new Java API surface.sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTtsModelConfig.java (1)
10-24: Zipvoice field integration in Java model config is consistent and non-breaking.The new
zipvoicefield, getter, builder default, andsetZipvoicesetter align with the KotlinOfflineTtsModelConfigand the standaloneOfflineTtsZipvoiceModelConfig. Defaulting toOfflineTtsZipvoiceModelConfig.builder().build()preserves backward compatibility (non-null field) while allowing callers to override Zipvoice settings explicitly.Also applies to: 46-48, 50-56, 84-87
sherpa-onnx/jni/offline-tts.cc (1)
121-148: LGTM!The Zipvoice model configuration parsing follows the established pattern for other model types and correctly maps all fields from the Kotlin/Java API to the C++ configuration struct.
sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/OfflineTts.java (3)
33-46: LGTM!The
generateWithPromptoverload chain follows the established pattern and provides sensible defaults (speed=1.0f,num_step=4) consistent with the existing API design and Kotlin defaults.
63-76: LGTM!The
generateWithPromptWithCallbackoverload chain is consistent with the existing callback API pattern and uses the same sensible defaults asgenerateWithPrompt.
99-103: LGTM!The native method declarations correctly match the JNI function signatures and follow the established pattern for returning audio data.
85f45f8 to
754550a
Compare
754550a to
e8ffd0f
Compare
|
Thanks! Can you use APIs from #3344 |
also a java & kotlin api (and corresponding demo/test) for zipvoice
to compatible with zero shot tts. i added a new generateWithPrompt api
fix callback been ignored in zipvoice
todo:
Summary by CodeRabbit
New Features
Build & Testing
Bug Fixes
Other
✏️ Tip: You can customize this high-level summary in your review settings.