The max_num environment variable does not work when using vllm as the backend

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
在使用InternVL3-8B进行推理时，允许使用`export MAX_NUM=6`来控制动态高分辨率的子图数量。在使用pt作为backend时该参数能正常起作用（不会超长），但在设置`--infer_backend vllm`后该参数不起作用，会出现token过长的报错：ValueError: The decoder prompt (length 35191) is longer than the maximum model length of 32768.

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
该现象与CUDA无关

**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The max_num environment variable does not work when using vllm as the backend #7631

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The max_num environment variable does not work when using vllm as the backend #7631

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions