Skip to content

Conversation

@leejianwoo-collab
Copy link

@leejianwoo-collab leejianwoo-collab commented Dec 14, 2025

Fix GLM-4.1V dummy video input generation failure

Problem

GLM-4.1V engine initialization fails with ValueError: t:1 must be larger than temporal_factor:2 during dummy video input generation.

Root Cause

  1. Issue Location: vllm/model_executor/models/glm4_1v.py:995 in get_num_frames_with_most_features() function
  2. Problem: Function returns max(max_frames_per_video, 1), which can return 1 when video token allocation is insufficient
  3. Constraint Violation: GLM-4.1V's smart_resize function requires num_frames > temporal_factor=2 (strict inequality)
  4. Failure: num_frames=1 violates 1 > 2, causing ValueError and engine core crash

Error Stack Trace

ValueError: t:1 must be larger than temporal_factor:2
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_DP0': 1}

Solution

Change the minimum fallback value from 1 to 3 in get_num_frames_with_most_features():

# Before (line 995)
return max(max_frames_per_video, 1)

# After (line 995)  
return max(max_frames_per_video, 3)

Why 3?

  • GLM-4.1V constraint: num_frames > temporal_factor=2
  • Minimum valid value: num_frames >= 3
  • Our fix ensures: 3 > 2 ✓ (constraint satisfied)

Impact

  • Fixes: GLM-4.1V engine initialization failure for text-only/image-only inference
  • Scope: Only affects GLM-4.1V dummy video input generation fallback case
  • Backward Compatible: No breaking changes to existing functionality
  • Performance: Minimal impact - only affects edge case with insufficient video token allocation

Testing

  • Verified fix addresses the specific constraint violation
  • Existing GLM-4.1V tests should continue to pass
  • Created test case to validate minimum frame count requirement

Related Issues

Fixes #30638

Checklist

  • Identified root cause in get_num_frames_with_most_features()
  • Implemented minimal fix changing fallback from 1 to 3
  • Verified fix satisfies GLM-4.1V's num_frames > temporal_factor=2 constraint
  • Created test case for validation
  • Confirmed no impact on other model implementations

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a ValueError that occurs during GLM-4.1V engine initialization. The root cause, related to dummy video input generation, has been accurately identified, and the proposed fix of changing the minimum fallback value for frames is effective. My review includes one suggestion to enhance the code's maintainability by programmatically deriving this fallback value instead of using a hardcoded number.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: nathon <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which model repo are you using and what is your transformers version? I can't seem to reproduce the problem even after hardcoding this method to return 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: GLM-4.1V dummy video input generation fails with ValueError (num_frames=1 < temporal_factor=2)

2 participants