fix(asset_based): handle scenes with empty narrations list#140
Open
jonathanzhan1975 wants to merge 1 commit into
Open
fix(asset_based): handle scenes with empty narrations list#140jonathanzhan1975 wants to merge 1 commit into
jonathanzhan1975 wants to merge 1 commit into
Conversation
The `asset_based` pipeline crashes with `IndexError: list index out of
range` when a scene's `narrations` field is an empty list. This is a
legitimate LLM output for cover slides, chapter dividers, or any
purely-visual scene, and is reliably reproducible by uploading a
title slide as the first asset.
Root cause:
The pattern `narrations = scene.get("narrations", [scene.get("narration", "")])`
appears in four places (log preview, all_narrations aggregation, storyboard
frame creation, TTS production). All four assume `len(narrations) >= 1` and
silently degrade or crash when the list is empty:
- L382 (log preview): empty preview, hides the gap
- L448-453 (all_narrations.extend): silently drops scene from context.narrations
- L495-505 (StoryboardFrame): empty subtitle string
- L570-635 (TTS loop): IndexError at `narration_audios[0]` (the user-facing crash)
Fix:
1. Introduce `_coerce_narrations(scene)` helper that normalizes any of
{missing, None, str, list, list-with-empty-strings} into a non-empty
list of stripped non-empty strings, falling back to `[""]` when the
scene is genuinely silent.
2. Replace all four call sites with the helper.
3. In the TTS loop, skip empty narration text and, when a scene produces
zero audio segments, synthesize a 3-second silent track with ffmpeg
so downstream composition still has a valid audio_path.
Tested: 9-slide PPT (slide 1 = cover with no narration) that previously
crashed at "Scene 1 has 0 narration(s)" now completes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
asset_basedpipeline (custom_media in the UI) crashes withIndexError: list index out of rangewhen a scene has an emptynarrationslist. This is a legitimate LLM output for cover slides, chapter dividers, or any purely-visual scene — it is reliably reproducible by uploading a title slide as the first asset.Reproduction
Observed log:
The pipeline aborts before any video is produced.
Root cause
The same defensive read pattern appears in four places:
It guards against missing keys and string values, but does not handle
narrationsbeing an empty list (which happens when LLM legitimately decides a scene has no narration). All four call sites silently degrade or crash:narrationsall_narrations.extend)context.narrationsStoryboardFrame.narration)IndexErroratnarration_audios[0](L635) — user-facing crashOnly the last one crashes, but the other three create silent data drift that is harder to diagnose later.
Fix
Centralize the read with a single
_coerce_narrations(scene)helper that normalizes any of{missing, None, str, list, list-with-empty-strings}into a non-empty list of stripped, non-empty strings, falling back to[""]when the scene is genuinely silent. The fallback preserves the existing length-based assumptions (len(narrations) >= 1) so callers can still index safely; callers that care about content can detect the gap via the empty string.Replace all four call sites with
narrations = _coerce_narrations(scene).In the TTS loop (
produce_assets):anullsrc) so downstream composition still has a validaudio_path. This lets cover slides render as a 3-second silent shot instead of crashing.The single-narration and multi-narration concat branches are unchanged.
Diff size
pixelle_video/pipelines/asset_based.py: +43 / −14 lines, single file.Verification
Out of scope
generate_contentwould be one path). This PR only ensures the pipeline does not crash when LLM legitimately decides a scene has no narration.config.cover_silent_duration) is a follow-up.Why no tests
The pipeline depends on LLM, ffmpeg, and a runtime task directory; there is no existing unit-test scaffold for
produce_assets._coerce_narrationsitself is a pure function and trivially testable if the project later adopts a test harness — happy to add tests in a follow-up if the maintainer wants.🤖 Generated with Claude Code