Skip to content

fix: key-frame cleanup race deletes snapshots mid-analysis; configurable grace + fallback-provider labels#684

Open
dmz006 wants to merge 1 commit into
valentinfrlch:mainfrom
dmz006:fix/keyframe-retention-and-fallback-labels
Open

fix: key-frame cleanup race deletes snapshots mid-analysis; configurable grace + fallback-provider labels#684
dmz006 wants to merge 1 commit into
valentinfrlch:mainfrom
dmz006:fix/keyframe-retention-and-fallback-labels

Conversation

@dmz006

@dmz006 dmz006 commented Jun 29, 2026

Copy link
Copy Markdown

Problem

On installs where an analysis runs longer than ~10 seconds (e.g. stream_analyzer
with a high max_frames/duration, or a slower local model), the key-frame
snapshot for a timeline event is deleted before the event is saved. The result
is blank thumbnails in the LLM Vision Card and image-less notifications — even
though the analysis succeeds and the timeline event is created and shows the
summary text.

Root cause (diagnosed on v1.7.0)

MediaProcessor._expose_image() writes the key frame to
/media/llmvision/snapshots/ at the start of an analysis, but the event that
"links" the file (and protects it from cleanup) is only inserted at the end,
after the LLM call returns. Meanwhile:

  • A new Timeline() is constructed per service call and per timeline API / card
    poll
    (__init__._create_event, api.py, calendar.py), and each
    construction schedules Timeline._cleanup() as a background task.
  • _cleanup() deletes any snapshot in the directory that is not linked to an
    event and is older than a hard-coded GRACE_SECONDS = 10.

So for any analysis longer than 10s, the freshly-written key frame ages past the
grace window while still unlinked, and a cleanup — fired by the very
event-creation flow, or by the card polling the timeline — removes it. The DB
event is then left pointing at a missing file.

Reproduced live: a stream_analyzer key frame written at T+0 was consistently
deleted at exactly T+10s (the grace boundary) while the analysis was still
running. With the fix it persists and links correctly. Instances that only ran
fast analyses (few frames) never hit the window, which is why this looked
intermittent.

Changes

  1. Shared in-flight key-frame registry (keyframe_registry.py). Protection
    now lives in hass.data[DOMAIN], so the writer, the inserter and the cleaner
    (three different Timeline/MediaProcessor instances) all see the same set.
    _expose_image registers the frame the moment it is written; create_event
    releases it once the row is inserted; entries are TTL-bounded so a failed
    analysis cannot leak files. _cleanup consults this across all instances, so
    the race is fixed regardless of how often cleanup fires.

  2. Configurable cleanup graceCONF_CLEANUP_GRACE (default 300s) in
    LLM Vision Settings → Timeline, replacing the hard-coded GRACE_SECONDS = 10.

  3. Per-call overridekeyframe_grace service field on the analyzer
    services, surfaced as a blueprint input after Max Frames with guidance to
    raise it alongside Duration / Max Frames. 0 = use the global setting.

  4. Fallback-provider dropdown labels now use the config-entry title
    (e.g. Ollama (host:port)) instead of the bare provider type, so multiple
    providers of the same type are distinguishable. Extracted to a testable
    _fallback_provider_options() helper.

Tests

  • New tests/test_keyframe_registry.py (9 tests).
  • New _cleanup regression tests: registry-protected frame survives while
    unlinked then is removed once released; protection is shared across Timeline
    instances; configurable grace is honored (not hard-coded).
  • New fallback-provider label tests.
  • Full suite green on Python 3.13: pytest tests/ -m "not integration" → 423 passed.

Notes

  • Default grace of 300s comfortably covers slow multi-frame analyses while still
    pruning genuinely orphaned snapshots from failed runs.
  • Both event_summary.yaml and event_summary_beta.yaml blueprints are updated
    consistently (new input + analyzer wiring + duration/max_frames guidance).
  • No schema is attached to the analyzer services, so the new keyframe_grace
    field is backward/forward compatible (ignored by older code).

🤖 Generated with Claude Code

…ace + fallback labels

Key frames written by MediaProcessor._expose_image at the start of an analysis
were deleted by Timeline._cleanup before the matching timeline event was
inserted, whenever the analysis ran longer than the hard-coded 10s grace (e.g.
stream_analyzer with high max_frames / long duration). A new Timeline() is
constructed per service call and per timeline API/card poll, each scheduling
_cleanup, so the window was hit reliably -> blank dashboard cards and
image-less notifications.

- keyframe_registry: shared hass.data registry that protects an in-flight key
  frame from write time until its event row is inserted (TTL-bounded so a failed
  analysis cannot leak files). _cleanup consults it across all Timeline
  instances, fixing the race regardless of how often cleanup fires.
- Make the cleanup grace configurable: CONF_CLEANUP_GRACE (default 300s) in
  LLM Vision Settings > Timeline, replacing the hard-coded GRACE_SECONDS = 10.
- Per-call override: keyframe_grace service field on the analyzer services,
  surfaced as a blueprint input after Max Frames (with guidance to raise it
  alongside Duration / Max Frames). 0 = use the global setting.
- Fix the fallback-provider dropdown to label options with the config entry
  title (e.g. "Ollama (host:port)") instead of the bare provider type, so
  multiple providers of the same type are distinguishable. Extracted to a
  testable _fallback_provider_options() helper.

Tests: new tests/test_keyframe_registry.py (9), timeline cleanup race/grace
regression tests, and fallback-provider label tests. Full suite green on 3.13.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant