Skip to content

[pull] main from NVIDIA:main#371

Merged
pull[bot] merged 4 commits intoyingguo-trt:mainfrom
NVIDIA:main
Apr 9, 2026
Merged

[pull] main from NVIDIA:main#371
pull[bot] merged 4 commits intoyingguo-trt:mainfrom
NVIDIA:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 9, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

2ez4bz and others added 4 commits April 9, 2026 05:53
* Why?

The number of multimodal tokens was calculated incorrectly for
newer models that support both audio and vision modalities,
leading to errors at runtime when chunked prefill is enabled.

* What?

This commit fixes this issue, and adds a test for it (verified to
fail without the fix).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
#12659)

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
…p in can_forward blocks KV transfers and overflows CTX memory (#12640)

Signed-off-by: peihengh <259410613+peihu-nv@users.noreply.github.com>
@pull pull Bot locked and limited conversation to collaborators Apr 9, 2026
@pull pull Bot added the ⤵️ pull label Apr 9, 2026
@pull pull Bot merged commit 1d24866 into yingguo-trt:main Apr 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants