-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
enable flashinfer moe kernel for DP + EP
#36838
opened Mar 12, 2026 by
czhu-cohere
Loading…
3 of 5 tasks
Expand Speculative Decoding Coverage
speculative-decoding
v1
#36837
opened Mar 12, 2026 by
puririshi98
•
Draft
[Feat][Executor] Introduce RayExecutorV2
ci/build
v1
#36836
opened Mar 12, 2026 by
jeffreywang-anyscale
•
Draft
5 tasks
Increase Test Coverage for Distributed Comm Patterns
#36832
opened Mar 12, 2026 by
puririshi98
•
Draft
Add simple granite4 tool parser
documentation
Improvements or additions to documentation
tool-calling
#36827
opened Mar 11, 2026 by
maxdebayser
Loading…
[Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping
bug
Something isn't working
qwen
Related to Qwen models
#36825
opened Mar 11, 2026 by
hallerite
Loading…
2 tasks done
[Model Runner V2] Do not initialize sampler for non-last PP ranks
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#36824
opened Mar 11, 2026 by
WoosukKwon
Loading…
[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace
nvidia
torch.compile
vllm-ir
vLLM IR: intermediate representation and kernel registration
#36823
opened Mar 11, 2026 by
ProExpertProg
•
Draft
5 tasks
[Model] Add ColPali late interaction model for multi-modal retrieval
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
new-model
Requests to new models
#36818
opened Mar 11, 2026 by
Kaonael
Loading…
4 of 5 tasks
[Model Runner V2] Add Support for XD-RoPE
nvidia
v1
#36817
opened Mar 11, 2026 by
santiramos27
Loading…
5 tasks
[vLLM IR] 2/N batch-invariant-aware dispatching and rms_norm
vllm-ir
vLLM IR: intermediate representation and kernel registration
#36816
opened Mar 11, 2026 by
ProExpertProg
•
Draft
5 tasks
[Model Runner V2] Introduce num_tokens_for_attn
nvidia
v1
#36815
opened Mar 11, 2026 by
WoosukKwon
Loading…
[Tests] Skip model weight download for render-only test server
#36813
opened Mar 11, 2026 by
sagearc
Loading…
5 tasks
[Metrics] Temporary band-aid for "Counters can only be incremented by non-negative amounts"
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#36812
opened Mar 11, 2026 by
markmc
Loading…
[ROCm][Perf] Fused GEMM + static FP8 output quantization
rocm
Related to AMD ROCm
#36810
opened Mar 11, 2026 by
andyluo7
Loading…
Support temporal compression for videos
#36808
opened Mar 11, 2026 by
collinmccarthy
Loading…
5 tasks
[Bugfix] Pad Marlin FP8 MoE weight dims to tile alignment under TP > 1
bug
Something isn't working
#36807
opened Mar 11, 2026 by
ssubhanjali
Loading…
5 tasks
Only show FP4 Marlin fallback warning for w4a4 models
ready
ONLY add when PR is ready to merge/full CI is needed
#36806
opened Mar 11, 2026 by
mgoin
Loading…
[Test] E2E Nemotron-3-Super tests
ci/build
nvidia
ready
ONLY add when PR is ready to merge/full CI is needed
#36803
opened Mar 11, 2026 by
roikoren755
Loading…
5 tasks
[kv_offload] Fix bare Exception types and add FilterReusedOffloadingManager tests
v1
#36801
opened Mar 11, 2026 by
Hongbin10
Loading…
3 of 5 tasks
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request
bug
Something isn't working
qwen
Related to Qwen models
#36800
opened Mar 11, 2026 by
Isotr0py
Loading…
3 of 5 tasks
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels
ci/build
nvidia
performance
Performance-related issues
#36799
opened Mar 11, 2026 by
kylesayrs
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.