Skip to content

fix(asset_based): handle scenes with empty narrations list#140

Open
jonathanzhan1975 wants to merge 1 commit into
AIDC-AI:mainfrom
jonathanzhan1975:fix/asset-based-empty-narrations
Open

fix(asset_based): handle scenes with empty narrations list#140
jonathanzhan1975 wants to merge 1 commit into
AIDC-AI:mainfrom
jonathanzhan1975:fix/asset-based-empty-narrations

Conversation

@jonathanzhan1975
Copy link
Copy Markdown
Contributor

Summary

The asset_based pipeline (custom_media in the UI) crashes with IndexError: list index out of range when a scene has an empty narrations list. This is a legitimate LLM output for cover slides, chapter dividers, or any purely-visual scene — it is reliably reproducible by uploading a title slide as the first asset.

Reproduction

  1. Open the Custom Media (自定义素材) pipeline.
  2. Upload 9 PPT slide images. The first slide is a cover with no spoken narration.
  3. Provide an article / intent and click Generate.

Observed log:

INFO  asset_based:initialize_storyboard:530 - ✅ Created storyboard with 9 scenes
INFO  asset_based:produce_assets:544 - 🎬 Producing scene videos...
INFO  asset_based:produce_assets:555 - Producing scene 1/9...
INFO  asset_based:produce_assets:574 - Scene 1 has 0 narration(s)
ERROR linear:handle_exception:161 - Pipeline execution failed: list index out of range

The pipeline aborts before any video is produced.

Root cause

The same defensive read pattern appears in four places:

narrations = scene.get("narrations", [scene.get("narration", "")])
if isinstance(narrations, str):
    narrations = [narrations]

It guards against missing keys and string values, but does not handle narrations being an empty list (which happens when LLM legitimately decides a scene has no narration). All four call sites silently degrade or crash:

Location Behavior on empty narrations
L382 (log preview) Empty preview line, hides the gap
L448-453 (all_narrations.extend) Scene silently disappears from context.narrations
L495-505 (StoryboardFrame.narration) Empty subtitle string
L570-635 (TTS production) 💥 IndexError at narration_audios[0] (L635) — user-facing crash

Only the last one crashes, but the other three create silent data drift that is harder to diagnose later.

Fix

  1. Centralize the read with a single _coerce_narrations(scene) helper that normalizes any of {missing, None, str, list, list-with-empty-strings} into a non-empty list of stripped, non-empty strings, falling back to [""] when the scene is genuinely silent. The fallback preserves the existing length-based assumptions (len(narrations) >= 1) so callers can still index safely; callers that care about content can detect the gap via the empty string.

  2. Replace all four call sites with narrations = _coerce_narrations(scene).

  3. In the TTS loop (produce_assets):

    • Skip empty narration text (don't waste a TTS call).
    • When a scene produces zero audio segments, synthesize a 3-second silent track with ffmpeg (anullsrc) so downstream composition still has a valid audio_path. This lets cover slides render as a 3-second silent shot instead of crashing.

The single-narration and multi-narration concat branches are unchanged.

Diff size

pixelle_video/pipelines/asset_based.py: +43 / −14 lines, single file.

Verification

  • Reproduced the crash with a 9-slide PPT (slide 1 = cover with no narration).
  • After the fix, the pipeline completes; slide 1 plays as a 3-second silent shot, scenes 2–9 narrate normally.
  • Single-narration and multi-narration scenes are unaffected (verified via the existing branches at L623-664).

Out of scope

  • Whether the cover slide should get LLM-generated narration in the first place is a separate question (a cover-detection heuristic in generate_content would be one path). This PR only ensures the pipeline does not crash when LLM legitimately decides a scene has no narration.
  • The 3-second silent fallback duration is currently hardcoded; making it configurable (e.g. config.cover_silent_duration) is a follow-up.

Why no tests

The pipeline depends on LLM, ffmpeg, and a runtime task directory; there is no existing unit-test scaffold for produce_assets. _coerce_narrations itself is a pure function and trivially testable if the project later adopts a test harness — happy to add tests in a follow-up if the maintainer wants.

🤖 Generated with Claude Code

The `asset_based` pipeline crashes with `IndexError: list index out of
range` when a scene's `narrations` field is an empty list. This is a
legitimate LLM output for cover slides, chapter dividers, or any
purely-visual scene, and is reliably reproducible by uploading a
title slide as the first asset.

Root cause:
The pattern `narrations = scene.get("narrations", [scene.get("narration", "")])`
appears in four places (log preview, all_narrations aggregation, storyboard
frame creation, TTS production). All four assume `len(narrations) >= 1` and
silently degrade or crash when the list is empty:

- L382 (log preview): empty preview, hides the gap
- L448-453 (all_narrations.extend): silently drops scene from context.narrations
- L495-505 (StoryboardFrame): empty subtitle string
- L570-635 (TTS loop): IndexError at `narration_audios[0]` (the user-facing crash)

Fix:
1. Introduce `_coerce_narrations(scene)` helper that normalizes any of
   {missing, None, str, list, list-with-empty-strings} into a non-empty
   list of stripped non-empty strings, falling back to `[""]` when the
   scene is genuinely silent.
2. Replace all four call sites with the helper.
3. In the TTS loop, skip empty narration text and, when a scene produces
   zero audio segments, synthesize a 3-second silent track with ffmpeg
   so downstream composition still has a valid audio_path.

Tested: 9-slide PPT (slide 1 = cover with no narration) that previously
crashed at "Scene 1 has 0 narration(s)" now completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 8, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants