[Bug]: GLM-4.1V dummy video input generation fails with ValueError (num_frames=1 < temporal_factor=2)

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
vllm==0.11.0
```

</details>

### 🐛 Describe the bug
When initializing GLM-4.1V with vLLM (even for text-only/image-only inference), the engine core fails to start due to a **constraint mismatch in dummy video input generation**:

1. vLLM automatically generates dummy video inputs during encoder budget calculation (even when no video is provided by the user).
2. The fallback logic in `get_num_frames_with_most_features` sets `num_frames=1` (minimum value) when video token allocation is insufficient.
3. GLM-4.1V's `smart_resize` function enforces a strict constraint: `num_frames > temporal_factor=2` (not `>=`), triggering a ValueError.
4. This leads to fatal engine core initialization failure, blocking all inference for GLM-4.1V.

#### Key Context
- The error occurs **before any user prompts are processed** (during engine initialization).
- No explicit video input is provided—this is an internal vLLM dummy input generation issue.

#### Minimal reproducible example
```python3
from vllm import LLM, SamplingParams
import os



# Initialize GLM-4.1V (triggers dummy video input generation)
llm = LLM(
    model="THUDM/glm-4.1v",  
    tensor_parallel_size=2,
    dtype="bfloat16",
    enforce_eager=True,
    max_model_len=4096,
    disable_custom_all_reduce=True,  # NPU-specific
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=32,
)

# Text-only prompt (error occurs during llm.generate call)
outputs = llm.generate(
    ["Describe the image."],
    sampling_params,
)

for output in outputs:
    print(output.outputs[0].text)
```

#### Observed behavior
The engine core crashes during initialization with the following critical error:
```
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}
```

#### Full critical error traceback
```
ERROR 12-14 18:32:20 [core.py:708] Traceback (most recent call last):
  File "/home/silas/vllm/vllm/multimodal/processing.py", line 1057, in call_hf_processor
    output = hf_processor(**data,
  File "/home/silas/transformers/src/transformers/models/glm4v/processing_glm4v.py", line 150, in __call__
    videos_inputs = self.video_processor(videos=videos, **output_kwargs["videos_kwargs"])
  File "/home/silas/transformers/src/transformers/video_processing_utils.py", line 206, in __call__
    return self.preprocess(videos,** kwargs)
  File "/home/silas/transformers/src/transformers/video_processing_utils.py", line 387, in preprocess
    preprocessed_videos = self._preprocess(videos=videos, **kwargs)
  File "/home/silas/transformers/src/transformers/models/glm4v/video_processing_glm4v.py", line 177, in _preprocess
    resized_height, resized_width = smart_resize(
  File "/home/silas/transformers/src/transformers/models/glm4v/image_processing_glm4v.py", line 59, in smart_resize
    raise ValueError(f"t:{num_frames} must be larger than temporal_factor:{temporal_factor}")
ValueError: t:1 must be larger than temporal_factor:2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 498, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 124, in __init__
    self.scheduler: SchedulerInterface = Scheduler(
  File "/home/silas/vllm/vllm/v1/core/sched/scheduler.py", line 142, in __init__
    encoder_compute_budget, encoder_cache_size = compute_encoder_budget(
  File "/home/silas/vllm/vllm/v1/core/encoder_cache_manager.py", line 264, in compute_encoder_budget
    .get_max_tokens_per_item_by_nonzero_modality(model_config)
  File "/home/silas/vllm/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
    max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
  File "/home/silas/vllm/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
    return profiler.get_mm_max_contiguous_tokens(
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
    return self._get_mm_max_tokens(seq_len,
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 262, in _get_mm_max_tokens
    mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 173, in _get_dummy_mm_inputs
    return self.processor.apply(
  File "/home/silas/vllm/vllm/multimodal/processing.py", line 1080, in call_hf_processor
    raise ValueError(msg) from exc
ValueError: Failed to apply Glm4vProcessor on data={...}

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}
```

#### Expected behavior
1. vLLM should respect GLM-4.1V's temporal factor constraint (set fallback `num_frames=2` instead of `1`).
2. Dummy video input generation should not fail for text-only/image-only requests.
3. Engine core should initialize successfully without manual workaround.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: GLM-4.1V dummy video input generation fails with ValueError (num_frames=1 < temporal_factor=2) #30638

Your current environment

🐛 Describe the bug

Key Context

Minimal reproducible example

Observed behavior

Full critical error traceback

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: GLM-4.1V dummy video input generation fails with ValueError (num_frames=1 < temporal_factor=2) #30638

Description

Your current environment

🐛 Describe the bug

Key Context

Minimal reproducible example

Observed behavior

Full critical error traceback

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions