Skip to content

[Bug]: GLM-4.1V dummy video input generation fails with ValueError (num_frames=1 < temporal_factor=2) #30638

@Silas-11

Description

@Silas-11

Your current environment

The output of python collect_env.py
vllm==0.11.0

🐛 Describe the bug

When initializing GLM-4.1V with vLLM (even for text-only/image-only inference), the engine core fails to start due to a constraint mismatch in dummy video input generation:

  1. vLLM automatically generates dummy video inputs during encoder budget calculation (even when no video is provided by the user).
  2. The fallback logic in get_num_frames_with_most_features sets num_frames=1 (minimum value) when video token allocation is insufficient.
  3. GLM-4.1V's smart_resize function enforces a strict constraint: num_frames > temporal_factor=2 (not >=), triggering a ValueError.
  4. This leads to fatal engine core initialization failure, blocking all inference for GLM-4.1V.

Key Context

  • The error occurs before any user prompts are processed (during engine initialization).
  • No explicit video input is provided—this is an internal vLLM dummy input generation issue.

Minimal reproducible example

from vllm import LLM, SamplingParams
import os



# Initialize GLM-4.1V (triggers dummy video input generation)
llm = LLM(
    model="THUDM/glm-4.1v",  
    tensor_parallel_size=2,
    dtype="bfloat16",
    enforce_eager=True,
    max_model_len=4096,
    disable_custom_all_reduce=True,  # NPU-specific
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=32,
)

# Text-only prompt (error occurs during llm.generate call)
outputs = llm.generate(
    ["Describe the image."],
    sampling_params,
)

for output in outputs:
    print(output.outputs[0].text)

Observed behavior

The engine core crashes during initialization with the following critical error:

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

Full critical error traceback

ERROR 12-14 18:32:20 [core.py:708] Traceback (most recent call last):
  File "/home/silas/vllm/vllm/multimodal/processing.py", line 1057, in call_hf_processor
    output = hf_processor(**data,
  File "/home/silas/transformers/src/transformers/models/glm4v/processing_glm4v.py", line 150, in __call__
    videos_inputs = self.video_processor(videos=videos, **output_kwargs["videos_kwargs"])
  File "/home/silas/transformers/src/transformers/video_processing_utils.py", line 206, in __call__
    return self.preprocess(videos,** kwargs)
  File "/home/silas/transformers/src/transformers/video_processing_utils.py", line 387, in preprocess
    preprocessed_videos = self._preprocess(videos=videos, **kwargs)
  File "/home/silas/transformers/src/transformers/models/glm4v/video_processing_glm4v.py", line 177, in _preprocess
    resized_height, resized_width = smart_resize(
  File "/home/silas/transformers/src/transformers/models/glm4v/image_processing_glm4v.py", line 59, in smart_resize
    raise ValueError(f"t:{num_frames} must be larger than temporal_factor:{temporal_factor}")
ValueError: t:1 must be larger than temporal_factor:2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 498, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/home/silas/vllm/vllm/v1/engine/core.py", line 124, in __init__
    self.scheduler: SchedulerInterface = Scheduler(
  File "/home/silas/vllm/vllm/v1/core/sched/scheduler.py", line 142, in __init__
    encoder_compute_budget, encoder_cache_size = compute_encoder_budget(
  File "/home/silas/vllm/vllm/v1/core/encoder_cache_manager.py", line 264, in compute_encoder_budget
    .get_max_tokens_per_item_by_nonzero_modality(model_config)
  File "/home/silas/vllm/vllm/multimodal/registry.py", line 167, in get_max_tokens_per_item_by_nonzero_modality
    max_tokens_per_item = self.get_max_tokens_per_item_by_modality(
  File "/home/silas/vllm/vllm/multimodal/registry.py", line 143, in get_max_tokens_per_item_by_modality
    return profiler.get_mm_max_contiguous_tokens(
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 282, in get_mm_max_contiguous_tokens
    return self._get_mm_max_tokens(seq_len,
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 262, in _get_mm_max_tokens
    mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
  File "/home/silas/vllm/vllm/multimodal/profiling.py", line 173, in _get_dummy_mm_inputs
    return self.processor.apply(
  File "/home/silas/vllm/vllm/multimodal/processing.py", line 1080, in call_hf_processor
    raise ValueError(msg) from exc
ValueError: Failed to apply Glm4vProcessor on data={...}

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

Expected behavior

  1. vLLM should respect GLM-4.1V's temporal factor constraint (set fallback num_frames=2 instead of 1).
  2. Dummy video input generation should not fail for text-only/image-only requests.
  3. Engine core should initialize successfully without manual workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions