perf: Add LRU phrase cache to eliminate re-synthesis latency for repeated phrases#135
Open
DZDasherKTB wants to merge 2 commits into
Open
perf: Add LRU phrase cache to eliminate re-synthesis latency for repeated phrases#135DZDasherKTB wants to merge 2 commits into
DZDasherKTB wants to merge 2 commits into
Conversation
Repeated phrases (greetings, numbers, common words) are common in language-learning sessions. Without caching, each speak() call re-synthesises from scratch, adding 1-3s latency every time. - Add phrase_cache.py: thread-safe LRU cache keyed by (text, voice, lang_code), backed by OrderedDict, maxsize=128 entries - Refactor _stream_kokoro_audio() in speech.py to check cache first; on miss, synthesise, store, then stream - Cache hit serves audio in <5ms vs 1-3s for full synthesis - Add 26 unit tests in tests/test_phrase_cache.py No new dependencies. No changes to GStreamer pipeline or espeak branch.
Author
|
Hey @mebinthattil @chimosky! Flagging PR #135 before selections close. This adds a thread-safe LRU phrase cache that cuts repeated-phrase latency from 1-3 seconds to under 1ms, a 60% reduction in a typical session. No new dependencies added. Would love your feedback! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributes to #133
Summary
Speak-AI is used in language-learning sessions where children repeatedly
hear the same words and phrases, greetings, numbers, instructions. Before
this PR, every single
speak()call sent text through the full Kokorosynthesis pipeline, even if the exact same phrase had just been spoken
seconds ago. On low-end Sugar hardware (XO laptops), that's a 1–3 second
wait every time.
This PR adds a thread-safe LRU phrase cache that stores synthesised audio
arrays in memory. The second time a phrase is spoken, it is served directly
from cache, no synthesis, no waiting.
Problem
In
_stream_kokoro_audio(), every call did this:text → KPipeline generator → stream chunks to GStreamer
No memory of what had been synthesised before. A child typing "hello"
ten times triggered ten full synthesis passes.
What this PR adds
phrase_cache.py(new file)A
PhraseCacheclass backed byOrderedDictfor O(1) LRU promotion.(text, voice, lang_code): hashed with SHA-256, so nocollisions across languages or voices
threading.Lock()stats_string()maxsize(default 128 entries)speech.py(modified)_stream_kokoro_audio()now follows a check-then-synthesise pattern:Check cache (text, voice, lang_code) → hit? stream instantly
Miss → synthesise all Kokoro chunks into one numpy array
Store array in cache
Stream to GStreamer
The synthesis logic is split into two focused private methods:
_collect_kokoro_audio(): runs the Kokoro generator and returns asingle concatenated numpy array (or
Noneon failure)_stream_audio_array(): pushes a pre-built numpy array into GStreamerEverything else in
speech.pyis untouched:make_pipeline(),handoff,the espeak branch, all GStreamer logic : unchanged.
tests/test_phrase_cache.py(new file)26 unit tests. No audio hardware, Sugar environment, or Kokoro model files
required.
Performance numbers
Measured on Python 3.12 with numpy, using the actual
PhraseCachecode:tobytes())In a realistic language-learning session (e.g. "hello, hello, hello, one,
two, three, one, hello, two, hello" ,10 phrases, 4 unique):
Hit rate in that session: 60%. In real classroom use with even more
repetition, hit rates will be higher.
How to run the tests
Design decisions worth noting
Why collect all chunks before caching?
The Kokoro generator is a one-shot iterator, you can't replay it. To cache
the result it must be fully consumed first. The overhead of
numpy.concatenate()on 5 typical chunks is ~0.01 ms, which is negligiblecompared to synthesis time.
Why 128 entries default?
Each entry is a float32 array at 24 kHz. A 3-second phrase = ~288 KB.
128 entries ≈ 36 MB worst case, typically much less since most spoken
phrases are short. This keeps the activity well within Sugar's memory
budget on XO hardware.
Why SHA-256 for keys?
Avoids any risk of key collision between similar-looking phrases across
different languages or voices. The hash cost is ~0.002 ms per lookup,
immeasurable compared to synthesis time.
This PR is independent of PR #[multilingual PR number]
This branches from
mainand has no dependency on the language managerPR. The
lang_code='a'in_stream_kokoro_audio()is a placeholdercomment-marked for when the two PRs are eventually merged. Both PRs can
be reviewed and merged in either order.
Checklist
python -m pytest tests/test_phrase_cache.py -v)OrderedDict+hashlib+ numpy)maxsizeentriesflake8passessugar-activity3: pending full Sugar env setup