forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Pull requests: TomerBN-Nvidia/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix D2H buffer race in routed-experts async pipeline
#19
opened Apr 28, 2026 by
TomerBN-Nvidia
Owner
Loading…
[Bugfix] Drop worker-side stop predicate in routed-experts extractor
#18
opened Apr 28, 2026 by
TomerBN-Nvidia
Owner
Loading…
fix: skip w13 swap in FlashInfer CUTLASS path for non-gated MoE
#17
opened Apr 25, 2026 by
djmmoss
Loading…
3 tasks done
[Core] Add monolithic kernel routing replay and prefix caching sentinel
#12
opened Apr 16, 2026 by
TomerBN-Nvidia
Owner
•
Draft
2 of 4 tasks
Implement multi-node vLLM health check
#7
opened Mar 25, 2026 by
TomerBN-Nvidia
Owner
Loading…
5 tasks
[draft] Fix for reload_weights API for ModelOpt MXFP8
#4
opened Mar 16, 2026 by
guyueh1
Loading…
5 tasks
fix: Use fake mxfp8 quant on intermediate tensor of MoE
#1
opened Dec 1, 2025 by
guyueh1
Loading…
5 tasks
ProTip!
Mix and match filters to narrow down what you’re looking for.