Summary
When the asset_based pipeline generates a script for many assets (e.g. 9 PPT slides) using a Chinese LLM (DeepSeek), the structured-output JSON response is silently truncated by max_tokens, which then surfaces as an unhelpful Failed to parse LLM response as VideoScript: {...first 200 chars...} error. The user has no way to know the real cause is truncation.
Reproduction
- Open the Custom Media (自定义素材) pipeline.
- Upload 9 image assets with full Windows-style absolute paths (e.g.
E:\works\study\Pixelle-Video\temp\assets_xxxxx\幻灯片N.PNG, ~80 chars each).
- Provide a Chinese article / intent and click Generate.
- LLM is
deepseek-v4-flash (or similar Chinese-tokenizer model).
Observed log:
DEBUG llm_service:_call_with_structured_output:247 - Structured output response length: 3028 chars
ERROR llm_service:__call__:202 - LLM call error (model=deepseek-v4-flash, ...): Failed to parse LLM response as VideoScript: {
"scenes": [
{
"scene_number": 1,
"asset_path": "E:\\works\\study\\Pixelle-Video\\temp\\assets_16cd17476857\\幻灯片1.PNG",
"narrations": [
"2026年高考强基计划报名已经结束,但真正的战役才刚刚开始。...
ValueError: Failed to parse LLM response as VideoScript: ...
The pipeline aborts before scene 1 is produced.
Root cause
Three independent issues compound here:
1. max_tokens=4000 is too tight for 9-scene Chinese structured output
pixelle_video/pipelines/asset_based.py L342-346:
script: VideoScript = await self.core.llm(
prompt=prompt,
response_type=VideoScript,
temperature=0.8,
max_tokens=4000
)
For 9 scenes × (full Windows asset path ~80 chars + 1-5 Chinese narrations × 30-100 chars + JSON structure ~50 chars), the output easily exceeds 4000 tokens with DeepSeek's Chinese tokenizer (~1.0-1.5 token/char). The reproducer above hit 3028 chars of response — close enough to the ceiling that the JSON was cut off mid-string.
2. _call_with_structured_output never checks finish_reason
pixelle_video/services/llm_service.py L238-247:
response = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": enhanced_prompt}],
temperature=temperature,
max_tokens=max_tokens,
**kwargs
)
content = response.choices[0].message.content
logger.debug(f"Structured output response length: {len(content)} chars")
# Parse JSON from response content
return self._parse_response_as_model(content, response_type)
OpenAI-compatible APIs return response.choices[0].finish_reason == "length" when the response was truncated by max_tokens. This code never inspects it. As a result, truncation is silently passed to the JSON parser, which then fails for unrelated-looking reasons.
3. _parse_response_as_model error message is misleading
pixelle_video/services/llm_service.py L292-320:
def _parse_response_as_model(self, content: str, response_type: Type[T]) -> T:
# Try direct JSON parsing first
try:
data = json.loads(content)
return response_type.model_validate(data)
except json.JSONDecodeError:
pass
# Try extracting from markdown code block
json_pattern = r'```(?:json)?\s*([\s\S]+?)\s*```'
match = re.search(json_pattern, content, re.DOTALL)
if match:
try:
data = json.loads(match.group(1))
return response_type.model_validate(data)
except json.JSONDecodeError:
pass
# Try to find any JSON object in the text
brace_start = content.find('{')
brace_end = content.rfind('}')
if brace_start != -1 and brace_end > brace_start:
try:
json_str = content[brace_start:brace_end + 1]
data = json.loads(json_str)
return response_type.model_validate(data)
except json.JSONDecodeError:
pass
raise ValueError(f"Failed to parse LLM response as {response_type.__name__}: {content[:200]}...")
Two further problems with this method:
a. The fallbacks only catch json.JSONDecodeError. If json.loads succeeds but response_type.model_validate(data) raises pydantic.ValidationError, that exception bypasses all three fallbacks and propagates with a different (less helpful) message — masking the real cause.
b. The final error message only shows content[:200]. For truncation bugs, the last 100 characters are far more diagnostic than the first 200, because that's where the truncation cliff lives. The user sees a clean-looking JSON header and is misled into thinking the LLM "returned a bad format" rather than "returned a complete-looking-but-truncated body".
Why this matters
A user uploading 9 reasonable assets (a normal use case for a slide-narration pipeline) currently gets:
- A 30-second LLM call that costs API quota
- Aborts with a cryptic "Failed to parse LLM response" message
- No hint that the cure is to bump
max_tokens or reduce the asset path lengths
The fix is short and doesn't require any new dependencies.
Suggested fix
A. Bump max_tokens for asset_based structured output
Single-line change in pixelle_video/pipelines/asset_based.py L346:
max_tokens=8000 # was 4000 — too tight for 9+ scenes with Chinese narrations
A more principled version would scale by asset count: max_tokens=max(4000, 1000 + 800 * len(assets)).
B. Detect finish_reason == "length" and fail loudly
In pixelle_video/services/llm_service.py::_call_with_structured_output, between L245 and L250:
choice = response.choices[0]
content = choice.message.content
finish_reason = getattr(choice, "finish_reason", None)
logger.debug(f"Structured output response length: {len(content)} chars, finish_reason={finish_reason}")
if finish_reason == "length":
raise ValueError(
f"LLM response truncated by max_tokens={max_tokens} "
f"(got {len(content)} chars, finish_reason='length'). "
f"Increase max_tokens or reduce input size."
)
This converts the silent truncation into an explicit, actionable error.
C. Catch ValidationError and improve the parse-failure message
In _parse_response_as_model, also catch pydantic.ValidationError in the three fallbacks, and include the tail of the content in the final raise:
from pydantic import ValidationError
def _parse_response_as_model(self, content: str, response_type: Type[T]) -> T:
parse_errors = []
for attempt_name, candidate in self._json_candidates(content):
try:
data = json.loads(candidate)
return response_type.model_validate(data)
except (json.JSONDecodeError, ValidationError) as e:
parse_errors.append(f"{attempt_name}: {type(e).__name__}: {e}")
raise ValueError(
f"Failed to parse LLM response as {response_type.__name__}.\n"
f" length: {len(content)} chars\n"
f" first 200: {content[:200]!r}\n"
f" last 100: ...{content[-100:]!r}\n"
f" attempts: {' | '.join(parse_errors)}"
)
(Where _json_candidates is a small helper that yields the three current strategies.)
Out of scope
- Whether the prompt should ever produce paths in the output at all is a separate design question. Echoing 9 × 80-char Windows paths back into the response is wasteful regardless. A future improvement would be to have the LLM emit
scene_index referencing an upstream asset list, instead of repeating the full path.
Related
Summary
When the
asset_basedpipeline generates a script for many assets (e.g. 9 PPT slides) using a Chinese LLM (DeepSeek), the structured-output JSON response is silently truncated bymax_tokens, which then surfaces as an unhelpfulFailed to parse LLM response as VideoScript: {...first 200 chars...}error. The user has no way to know the real cause is truncation.Reproduction
E:\works\study\Pixelle-Video\temp\assets_xxxxx\幻灯片N.PNG, ~80 chars each).deepseek-v4-flash(or similar Chinese-tokenizer model).Observed log:
The pipeline aborts before scene 1 is produced.
Root cause
Three independent issues compound here:
1.
max_tokens=4000is too tight for 9-scene Chinese structured outputpixelle_video/pipelines/asset_based.pyL342-346:For 9 scenes × (full Windows asset path ~80 chars + 1-5 Chinese narrations × 30-100 chars + JSON structure ~50 chars), the output easily exceeds 4000 tokens with DeepSeek's Chinese tokenizer (~1.0-1.5 token/char). The reproducer above hit
3028 charsof response — close enough to the ceiling that the JSON was cut off mid-string.2.
_call_with_structured_outputnever checksfinish_reasonpixelle_video/services/llm_service.pyL238-247:OpenAI-compatible APIs return
response.choices[0].finish_reason == "length"when the response was truncated bymax_tokens. This code never inspects it. As a result, truncation is silently passed to the JSON parser, which then fails for unrelated-looking reasons.3.
_parse_response_as_modelerror message is misleadingpixelle_video/services/llm_service.pyL292-320:Two further problems with this method:
a. The fallbacks only catch
json.JSONDecodeError. Ifjson.loadssucceeds butresponse_type.model_validate(data)raisespydantic.ValidationError, that exception bypasses all three fallbacks and propagates with a different (less helpful) message — masking the real cause.b. The final error message only shows
content[:200]. For truncation bugs, the last 100 characters are far more diagnostic than the first 200, because that's where the truncation cliff lives. The user sees a clean-looking JSON header and is misled into thinking the LLM "returned a bad format" rather than "returned a complete-looking-but-truncated body".Why this matters
A user uploading 9 reasonable assets (a normal use case for a slide-narration pipeline) currently gets:
max_tokensor reduce the asset path lengthsThe fix is short and doesn't require any new dependencies.
Suggested fix
A. Bump
max_tokensfor asset_based structured outputSingle-line change in
pixelle_video/pipelines/asset_based.pyL346:A more principled version would scale by asset count:
max_tokens=max(4000, 1000 + 800 * len(assets)).B. Detect
finish_reason == "length"and fail loudlyIn
pixelle_video/services/llm_service.py::_call_with_structured_output, between L245 and L250:This converts the silent truncation into an explicit, actionable error.
C. Catch
ValidationErrorand improve the parse-failure messageIn
_parse_response_as_model, also catchpydantic.ValidationErrorin the three fallbacks, and include the tail of the content in the final raise:(Where
_json_candidatesis a small helper that yields the three current strategies.)Out of scope
scene_indexreferencing an upstream asset list, instead of repeating the full path.Related