asset_based: LLM structured output silently truncated by max_tokens, surfaces as cryptic JSON parse error

## Summary

When the `asset_based` pipeline generates a script for many assets (e.g. 9 PPT slides) using a Chinese LLM (DeepSeek), the structured-output JSON response is silently truncated by `max_tokens`, which then surfaces as an unhelpful `Failed to parse LLM response as VideoScript: {...first 200 chars...}` error. The user has no way to know the real cause is truncation.

## Reproduction

1. Open the **Custom Media (自定义素材)** pipeline.
2. Upload 9 image assets with full Windows-style absolute paths (e.g. `E:\works\study\Pixelle-Video\temp\assets_xxxxx\幻灯片N.PNG`, ~80 chars each).
3. Provide a Chinese article / intent and click **Generate**.
4. LLM is `deepseek-v4-flash` (or similar Chinese-tokenizer model).

Observed log:

```
DEBUG llm_service:_call_with_structured_output:247 - Structured output response length: 3028 chars
ERROR llm_service:__call__:202 - LLM call error (model=deepseek-v4-flash, ...): Failed to parse LLM response as VideoScript: {
  "scenes": [
    {
      "scene_number": 1,
      "asset_path": "E:\\works\\study\\Pixelle-Video\\temp\\assets_16cd17476857\\幻灯片1.PNG",
      "narrations": [
        "2026年高考强基计划报名已经结束，但真正的战役才刚刚开始。...
ValueError: Failed to parse LLM response as VideoScript: ...
```

The pipeline aborts before scene 1 is produced.

## Root cause

Three independent issues compound here:

### 1. `max_tokens=4000` is too tight for 9-scene Chinese structured output

`pixelle_video/pipelines/asset_based.py` L342-346:

```python
script: VideoScript = await self.core.llm(
    prompt=prompt,
    response_type=VideoScript,
    temperature=0.8,
    max_tokens=4000
)
```

For 9 scenes × (full Windows asset path ~80 chars + 1-5 Chinese narrations × 30-100 chars + JSON structure ~50 chars), the output easily exceeds 4000 tokens with DeepSeek's Chinese tokenizer (~1.0-1.5 token/char). The reproducer above hit `3028 chars` of response — close enough to the ceiling that the JSON was cut off mid-string.

### 2. `_call_with_structured_output` never checks `finish_reason`

`pixelle_video/services/llm_service.py` L238-247:

```python
response = await client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": enhanced_prompt}],
    temperature=temperature,
    max_tokens=max_tokens,
    **kwargs
)
content = response.choices[0].message.content

logger.debug(f"Structured output response length: {len(content)} chars")

# Parse JSON from response content
return self._parse_response_as_model(content, response_type)
```

OpenAI-compatible APIs return `response.choices[0].finish_reason == "length"` when the response was truncated by `max_tokens`. This code never inspects it. As a result, truncation is silently passed to the JSON parser, which then fails for unrelated-looking reasons.

### 3. `_parse_response_as_model` error message is misleading

`pixelle_video/services/llm_service.py` L292-320:

```python
def _parse_response_as_model(self, content: str, response_type: Type[T]) -> T:
    # Try direct JSON parsing first
    try:
        data = json.loads(content)
        return response_type.model_validate(data)
    except json.JSONDecodeError:
        pass

    # Try extracting from markdown code block
    json_pattern = r'```(?:json)?\s*([\s\S]+?)\s*```'
    match = re.search(json_pattern, content, re.DOTALL)
    if match:
        try:
            data = json.loads(match.group(1))
            return response_type.model_validate(data)
        except json.JSONDecodeError:
            pass

    # Try to find any JSON object in the text
    brace_start = content.find('{')
    brace_end = content.rfind('}')
    if brace_start != -1 and brace_end > brace_start:
        try:
            json_str = content[brace_start:brace_end + 1]
            data = json.loads(json_str)
            return response_type.model_validate(data)
        except json.JSONDecodeError:
            pass

    raise ValueError(f"Failed to parse LLM response as {response_type.__name__}: {content[:200]}...")
```

Two further problems with this method:

a. The fallbacks only catch `json.JSONDecodeError`. If `json.loads` succeeds but `response_type.model_validate(data)` raises `pydantic.ValidationError`, that exception bypasses all three fallbacks and propagates with a different (less helpful) message — masking the real cause.

b. The final error message only shows `content[:200]`. For truncation bugs, the **last** 100 characters are far more diagnostic than the first 200, because that's where the truncation cliff lives. The user sees a clean-looking JSON header and is misled into thinking the LLM "returned a bad format" rather than "returned a complete-looking-but-truncated body".

## Why this matters

A user uploading 9 reasonable assets (a normal use case for a slide-narration pipeline) currently gets:

- A 30-second LLM call that costs API quota
- Aborts with a cryptic "Failed to parse LLM response" message
- No hint that the cure is to bump `max_tokens` or reduce the asset path lengths

The fix is short and doesn't require any new dependencies.

## Suggested fix

### A. Bump `max_tokens` for asset_based structured output

Single-line change in `pixelle_video/pipelines/asset_based.py` L346:

```python
max_tokens=8000  # was 4000 — too tight for 9+ scenes with Chinese narrations
```

A more principled version would scale by asset count: `max_tokens=max(4000, 1000 + 800 * len(assets))`.

### B. Detect `finish_reason == "length"` and fail loudly

In `pixelle_video/services/llm_service.py::_call_with_structured_output`, between L245 and L250:

```python
choice = response.choices[0]
content = choice.message.content
finish_reason = getattr(choice, "finish_reason", None)

logger.debug(f"Structured output response length: {len(content)} chars, finish_reason={finish_reason}")

if finish_reason == "length":
    raise ValueError(
        f"LLM response truncated by max_tokens={max_tokens} "
        f"(got {len(content)} chars, finish_reason='length'). "
        f"Increase max_tokens or reduce input size."
    )
```

This converts the silent truncation into an explicit, actionable error.

### C. Catch `ValidationError` and improve the parse-failure message

In `_parse_response_as_model`, also catch `pydantic.ValidationError` in the three fallbacks, and include the **tail** of the content in the final raise:

```python
from pydantic import ValidationError

def _parse_response_as_model(self, content: str, response_type: Type[T]) -> T:
    parse_errors = []
    for attempt_name, candidate in self._json_candidates(content):
        try:
            data = json.loads(candidate)
            return response_type.model_validate(data)
        except (json.JSONDecodeError, ValidationError) as e:
            parse_errors.append(f"{attempt_name}: {type(e).__name__}: {e}")

    raise ValueError(
        f"Failed to parse LLM response as {response_type.__name__}.\n"
        f"  length: {len(content)} chars\n"
        f"  first 200: {content[:200]!r}\n"
        f"  last 100:  ...{content[-100:]!r}\n"
        f"  attempts: {' | '.join(parse_errors)}"
    )
```

(Where `_json_candidates` is a small helper that yields the three current strategies.)

## Out of scope

- Whether the prompt should ever produce paths in the output at all is a separate design question. Echoing 9 × 80-char Windows paths back into the response is wasteful regardless. A future improvement would be to have the LLM emit `scene_index` referencing an upstream asset list, instead of repeating the full path.

## Related

- Discovered while testing the fix in #140 (handle scenes with empty narrations list). The two bugs are independent — #140 fixes a downstream IndexError; this issue is an upstream LLM-call problem that prevents reaching the produce_assets stage at all.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asset_based: LLM structured output silently truncated by max_tokens, surfaces as cryptic JSON parse error #141

Summary

Reproduction

Root cause

1. `max_tokens=4000` is too tight for 9-scene Chinese structured output

2. `_call_with_structured_output` never checks `finish_reason`

3. `_parse_response_as_model` error message is misleading

Why this matters

Suggested fix

A. Bump `max_tokens` for asset_based structured output

B. Detect `finish_reason == "length"` and fail loudly

C. Catch `ValidationError` and improve the parse-failure message

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

asset_based: LLM structured output silently truncated by max_tokens, surfaces as cryptic JSON parse error #141

Description

Summary

Reproduction

Root cause

1. max_tokens=4000 is too tight for 9-scene Chinese structured output

2. _call_with_structured_output never checks finish_reason

3. _parse_response_as_model error message is misleading

Why this matters

Suggested fix

A. Bump max_tokens for asset_based structured output

B. Detect finish_reason == "length" and fail loudly

C. Catch ValidationError and improve the parse-failure message

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `max_tokens=4000` is too tight for 9-scene Chinese structured output

2. `_call_with_structured_output` never checks `finish_reason`

3. `_parse_response_as_model` error message is misleading

A. Bump `max_tokens` for asset_based structured output

B. Detect `finish_reason == "length"` and fail loudly

C. Catch `ValidationError` and improve the parse-failure message