Skip to content

Fix Kokoro language pipeline: multi-lang dispatch, null audio guard, warning fixes #123

Open
Naitik120gupta wants to merge 1 commit intosugarlabs:mainfrom
Naitik120gupta:fix/kokoro-language-pipeline
Open

Fix Kokoro language pipeline: multi-lang dispatch, null audio guard, warning fixes #123
Naitik120gupta wants to merge 1 commit intosugarlabs:mainfrom
Naitik120gupta:fix/kokoro-language-pipeline

Conversation

@Naitik120gupta
Copy link
Copy Markdown

Problem

  • setup_kokoro() hardcoded lang_code='a' — selecting a Japanese, Chinese,
    Spanish, etc. voice still ran text through the English G2P, producing bad output
  • set_kokoro_voice() only updated the voice name, never switched the underlying pipeline
  • audio_chunk was dereferenced without a None check, causing AttributeError on quiet chunks
  • appsrc was referenced in the except block before being assigned → potential NameError
  • LANG_CODES.get(voice, voice) used the full voice name (e.g. jf_alpha) as a dict key;
    it should be voice[0] (j) which is the actual key

Changes

  • speech.py: add kokoro_pipelines dict and shared kokoro_model; lazy-init per-language
    pipelines in background threads reusing the loaded KModel; dispatch to correct pipeline in
    _stream_kokoro_audio; fix null guard and appsrc init; pass repo_id explicitly
  • kokoro/pipeline.py: fix LANG_CODES.get(voice[0], voice) in language-mismatch warning

  speech.py — 5 fixes

  1. Multi-language pipeline architecture (__init__ + setup_kokoro)
  - Added self.kokoro_pipelines = {} (dict: lang_code → KPipeline) and self.kokoro_model = None (shared neural model)
  - setup_kokoro() now passes repo_id='hexgrad/Kokoro-82M' explicitly (suppresses the noisy startup warning) and registers the created pipeline into kokoro_pipelines['a'], storing the
   shared KModel reference
  - self.kokoro_pipeline is kept as-is for backward compatibility with activity.py code that reads kokoro_pipeline.repo_id

  2. Language-aware lazy pipeline initialization (set_kokoro_voice + new _init_pipeline_for_lang)
  - When a voice from a different language group is selected (e.g. jf_alpha → Japanese, bf_alice → British English), set_kokoro_voice() extracts lang_code = voice_name[0] and spins up
   a background thread to create a KPipeline for that language, reusing the already-loaded KModel (no redundant model downloads)

  3. Correct pipeline dispatch in _stream_kokoro_audio
  - Now looks up self.kokoro_pipelines.get(lang_code) to route to the right G2P pipeline. Falls back gracefully to the default 'a' pipeline (with a warning) if the language-specific
  one hasn't finished initializing yet

  4. Null-check audio_chunk before .numpy()
  - Added if audio_chunk is None: continue — a None audio is valid (quiet pipeline mode or inference skip) and previously would crash with AttributeError

  5. appsrc = None before the try block
  - Moving initialization before try ensures the except block's if appsrc: guard doesn't raise NameError when the exception fires before appsrc is assigned

  ---
  kokoro/pipeline.py — 1 fix

  6. Wrong LANG_CODES key in language-mismatch warning
  - LANG_CODES.get(voice, voice) → LANG_CODES.get(voice[0], voice)
  - LANG_CODES is keyed by single characters ('j', 'z', 'a', …). voice is a full name like 'jf_alpha', which is never a key, so the lookup always silently fell back to the raw voice
  name. Using voice[0] correctly resolves e.g. 'j' → 'Japanese' in the warning message.
@Naitik120gupta Naitik120gupta force-pushed the fix/kokoro-language-pipeline branch from d42031b to 9925943 Compare April 15, 2026 16:44
@Naitik120gupta
Copy link
Copy Markdown
Author

Hey, @chimosky @mebinthattil
Please review this pr once you get time.
Thankyou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant