-
-
Notifications
You must be signed in to change notification settings - Fork 15k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Renderer] Remove ONLY add when PR is ready to merge/full CI is needed
v1
InputPreprocessor
ready
#38688
opened Apr 1, 2026 by
DarkLight1337
Loading…
5 tasks
fix(lora): use float32 intermediate buffer in fused MoE LoRA to prevent bf16 precision loss
#38686
opened Apr 1, 2026 by
prsabahrami
•
Draft
[Perf] DSV3.2 Indexer Fused Weights Projection
deepseek
Related to DeepSeek models
#38684
opened Apr 1, 2026 by
benchislett
Loading…
[Quantization] Rename mxfp4 quant layer and oracle to gpt_oss_mxfp4
gpt-oss
Related to GPT-OSS models
#38683
opened Apr 1, 2026 by
zyongye
Loading…
3 of 5 tasks
[XPU] [Quant] rename mxfp8_e4m3_quantize and add xpu backend implementation
intel-gpu
Related to Intel GPU
#38682
opened Apr 1, 2026 by
zufangzhu
Loading…
[CPU] Fix lscpu NUMA node regex to handle quoted - and null in containers
cpu
Related to CPU backends
#38681
opened Apr 1, 2026 by
Monokaix
Loading…
3 of 5 tasks
[CI][ROCm] Remove unsupported cases in test_fusion.py
rocm
Related to AMD ROCm
#38680
opened Apr 1, 2026 by
charlifu
Loading…
Fix llm_request trace context propagation
frontend
v1
#38678
opened Apr 1, 2026 by
will-deines
Loading…
[CPU] Support head_size 512 in cpu_attn
cpu
Related to CPU backends
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#38676
opened Apr 1, 2026 by
bigPYJ1151
Loading…
5 tasks
[Bugfix] Preserve original ImportError in gRPC server entrypoint
bug
Something isn't working
frontend
#38673
opened Apr 1, 2026 by
CatherineSue
Loading…
3 tasks done
[5/n] Migrate CUTLASS MLA, hadamard, awq, allspark and DSV3 fused a gemm to torch stable ABI
ci/build
nvidia
#38671
opened Apr 1, 2026 by
mikaylagawarecki
•
Draft
5 tasks
Fix Marlin repack PTX incompatibility on H100/H200 (CUDA 12.8)
ci/build
nvidia
#38669
opened Apr 1, 2026 by
DavidBellamy
Loading…
[ROCm] Enable dual-stream MoE shared experts and GLM-5 MXFP4 Quark support
rocm
Related to AMD ROCm
v1
#38665
opened Mar 31, 2026 by
ChuanLi1101
Loading…
4 tasks
[Core][Feat][ safely abort requests where FSM failed to advance
v1
#38663
opened Mar 31, 2026 by
walterbm
Loading…
3 of 5 tasks
[Kernel] feat: TurboQuant KV cache quantization (PolarQuant + QJL)
ci/build
v1
#38662
opened Mar 31, 2026 by
allaspectsdev
Loading…
1 of 6 tasks
[2/N] Pass Related to DeepSeek models
gpt-oss
Related to GPT-OSS models
llama
Related to Llama models
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
model_config to the Attention constructors
deepseek
#38661
opened Mar 31, 2026 by
MatthewBonanni
Loading…
3 of 5 tasks
[compile] Invoke split FX graph by codegen.
#38657
opened Mar 31, 2026 by
zhxchen17
Loading…
5 tasks
Fix Nano Nemotron VL regressions
multi-modality
Related to multi-modality (#4194)
#38655
opened Mar 31, 2026 by
netanel-haber
Loading…
[Bugfix] Fix Something isn't working
performance
Performance-related issues
vllm bench serve to count multimodal tokens in "total input tokens"
bug
#38654
opened Mar 31, 2026 by
mgehre-amd
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.