fix: key-frame cleanup race deletes snapshots mid-analysis; configurable grace + fallback-provider labels#684
Open
dmz006 wants to merge 1 commit into
Conversation
…ace + fallback labels Key frames written by MediaProcessor._expose_image at the start of an analysis were deleted by Timeline._cleanup before the matching timeline event was inserted, whenever the analysis ran longer than the hard-coded 10s grace (e.g. stream_analyzer with high max_frames / long duration). A new Timeline() is constructed per service call and per timeline API/card poll, each scheduling _cleanup, so the window was hit reliably -> blank dashboard cards and image-less notifications. - keyframe_registry: shared hass.data registry that protects an in-flight key frame from write time until its event row is inserted (TTL-bounded so a failed analysis cannot leak files). _cleanup consults it across all Timeline instances, fixing the race regardless of how often cleanup fires. - Make the cleanup grace configurable: CONF_CLEANUP_GRACE (default 300s) in LLM Vision Settings > Timeline, replacing the hard-coded GRACE_SECONDS = 10. - Per-call override: keyframe_grace service field on the analyzer services, surfaced as a blueprint input after Max Frames (with guidance to raise it alongside Duration / Max Frames). 0 = use the global setting. - Fix the fallback-provider dropdown to label options with the config entry title (e.g. "Ollama (host:port)") instead of the bare provider type, so multiple providers of the same type are distinguishable. Extracted to a testable _fallback_provider_options() helper. Tests: new tests/test_keyframe_registry.py (9), timeline cleanup race/grace regression tests, and fallback-provider label tests. Full suite green on 3.13. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On installs where an analysis runs longer than ~10 seconds (e.g.
stream_analyzerwith a high
max_frames/duration, or a slower local model), the key-framesnapshot for a timeline event is deleted before the event is saved. The result
is blank thumbnails in the LLM Vision Card and image-less notifications — even
though the analysis succeeds and the timeline event is created and shows the
summary text.
Root cause (diagnosed on v1.7.0)
MediaProcessor._expose_image()writes the key frame to/media/llmvision/snapshots/at the start of an analysis, but the event that"links" the file (and protects it from cleanup) is only inserted at the end,
after the LLM call returns. Meanwhile:
Timeline()is constructed per service call and per timeline API / cardpoll (
__init__._create_event,api.py,calendar.py), and eachconstruction schedules
Timeline._cleanup()as a background task._cleanup()deletes any snapshot in the directory that is not linked to anevent and is older than a hard-coded
GRACE_SECONDS = 10.So for any analysis longer than 10s, the freshly-written key frame ages past the
grace window while still unlinked, and a cleanup — fired by the very
event-creation flow, or by the card polling the timeline — removes it. The DB
event is then left pointing at a missing file.
Reproduced live: a
stream_analyzerkey frame written at T+0 was consistentlydeleted at exactly T+10s (the grace boundary) while the analysis was still
running. With the fix it persists and links correctly. Instances that only ran
fast analyses (few frames) never hit the window, which is why this looked
intermittent.
Changes
Shared in-flight key-frame registry (
keyframe_registry.py). Protectionnow lives in
hass.data[DOMAIN], so the writer, the inserter and the cleaner(three different
Timeline/MediaProcessorinstances) all see the same set._expose_imageregisters the frame the moment it is written;create_eventreleases it once the row is inserted; entries are TTL-bounded so a failed
analysis cannot leak files.
_cleanupconsults this across all instances, sothe race is fixed regardless of how often cleanup fires.
Configurable cleanup grace —
CONF_CLEANUP_GRACE(default 300s) inLLM Vision Settings → Timeline, replacing the hard-coded
GRACE_SECONDS = 10.Per-call override —
keyframe_graceservice field on the analyzerservices, surfaced as a blueprint input after Max Frames with guidance to
raise it alongside Duration / Max Frames.
0= use the global setting.Fallback-provider dropdown labels now use the config-entry title
(e.g.
Ollama (host:port)) instead of the bare provider type, so multipleproviders of the same type are distinguishable. Extracted to a testable
_fallback_provider_options()helper.Tests
tests/test_keyframe_registry.py(9 tests)._cleanupregression tests: registry-protected frame survives whileunlinked then is removed once released; protection is shared across
Timelineinstances; configurable grace is honored (not hard-coded).
pytest tests/ -m "not integration"→ 423 passed.Notes
pruning genuinely orphaned snapshots from failed runs.
event_summary.yamlandevent_summary_beta.yamlblueprints are updatedconsistently (new input + analyzer wiring + duration/max_frames guidance).
keyframe_gracefield is backward/forward compatible (ignored by older code).
🤖 Generated with Claude Code