Skip to content

Commit 46fe1bd

Browse files
committed
Squashed commit of the following:
commit 843715739b7b555c61dd6190cafb5ab7a44c41f1 Author: Yongye Zhu <zyy1102000@gmail.com> Date: Fri May 22 13:06:31 2026 -0400 [Refactor] Extract DeepSeek V4 sparse MLA impl into model folder (#43149) commit b21f3d56d4a2ab5504b56504e87e0475c6d84eb2 Author: Dao007forever <dao007forever@gmail.com> Date: Fri May 22 09:14:11 2026 -0700 [KV Connector] MooncakeStore: don't co-queue save with load to avoid double delayed-free (#43371) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> commit c7624bea5ebba1c688eb4c216bd4ede7a94f2a82 Author: Zhanda Zhu <49645678+zhandaz@users.noreply.github.com> Date: Fri May 22 12:10:03 2026 -0400 [Bugfix] Source num_qo_heads from Attention layers in Flashinfer/Triton metadata builders (#42650) Signed-off-by: zhanda <zhandazhu@gmail.com> Co-authored-by: Shang Wang <shangw@nvidia.com> commit 91f5b92438a568c89e8b9d6c2c55de5a552291f6 Author: Bugen Zhao <i@bugenzhao.com> Date: Fri May 22 23:22:11 2026 +0800 [Rust Frontend] [Refactor] Extract a newtype for utility call ID (#43405) Signed-off-by: Bugen Zhao <i@bugenzhao.com> commit f0feb15e7fc521544d23c2d23de0e327a509876b Author: Isotr0py <mozf@mail2.sysu.edu.cn> Date: Fri May 22 22:31:00 2026 +0800 [Multimodal] Simplify ViT CUDA graph interfaces (#41234) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> commit fb21d8b4f9027f4642637c7bb0acc08c29dce387 Author: sychen52 <41452870+sychen52@users.noreply.github.com> Date: Fri May 22 07:21:51 2026 -0700 Add NVFP4 MOE support for Deepseek V4. (#42209) Signed-off-by: Shiyang Chen <shiychen@nvidia.com> commit a377631d21cc97db678727455d33c4257435f417 Author: haosdent <haosdent@gmail.com> Date: Fri May 22 22:06:24 2026 +0800 [CI] Fix AMD docker build tests (#43329) Signed-off-by: haosdent <haosdent@gmail.com> commit d3a563501bcc6134a348f8458b1a797c94336f1f Author: Ilya Markov <markovilya197@gmail.com> Date: Fri May 22 15:43:27 2026 +0200 [EPLB] Change default EPLB communicator (#43110) Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> commit 15f7cd33dc8bd4d2270b70ba49d511827d2413ff Author: Jee Jee Li <pandaleefree@gmail.com> Date: Fri May 22 21:41:56 2026 +0800 [LoRA] Reduce memory of 2D weights when EP is set (#42737) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> commit 79ff0ffa98dc8dd14a8651bce36ce6265ff4d35d Author: Keyi Li <94494390+JasonKeyiL@users.noreply.github.com> Date: Fri May 22 05:26:41 2026 -0700 [BugFix] wire make_empty_intermediate_tensors on AyaVision and Voxtral (#43118) Signed-off-by: Keyi Li <likey6688@gmail.com> Co-authored-by: Keyi Li <likey6688@gmail.com> commit 4658bf882b881287fc85797a23037aa91740b7a7 Author: Tobias Wasner <wasnertobias@users.noreply.github.com> Date: Fri May 22 12:54:29 2026 +0200 [Bugfix] Clear P0 mm sender cache on sleep/pause to fix mm_hash desync (#43001) Signed-off-by: Tobias Wasner <wasnertobias@gmail.com> commit b3c7ffcab82c2439726f8cb213800f6f38c023d3 Author: Taneem Ibrahim <taneem.ibrahim@gmail.com> Date: Fri May 22 05:43:33 2026 -0500 [Misc] Replace assert with proper exceptions for security and validation in pooling (#43286) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> commit d3d1cf6972607c53327b5ce1748e56a95fc41c37 Author: Ma Jian <jian1.ma@intel.com> Date: Fri May 22 18:22:45 2026 +0800 [XPU]feat: add XPU fallback for MoE topk routing and MXFP4 backend (#42951) Signed-off-by: Ma Jian <jian1.ma@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> commit 7e1b45a09252a5b513cd83116aa7a2f310220c34 Author: wangxiyuan <wangxiyuan1007@gmail.com> Date: Fri May 22 17:13:12 2026 +0800 [Attention] Mamba attention module refactor (#41126) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> commit 65b7a812a2dabd212d78c7b5b8a320b4efb9750d Author: Li, Jiang <jiang1.li@intel.com> Date: Fri May 22 16:48:17 2026 +0800 [CPU] Experimentally enable Triton and MRV2 (#43225) Signed-off-by: jiang1.li <jiang1.li@intel.com> commit 2380bfc2104267914eea36015e2a347b9318c6c0 Author: wang.yuqi <yuqi.wang@daocloud.io> Date: Fri May 22 16:43:14 2026 +0800 [Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> commit a7616977176e12ddb14c0daab00cd2a2161ba37c Author: mrjunwan-lang <mrjunwan@google.com> Date: Fri May 22 01:36:17 2026 -0700 Fix the docker build failure in tpu-inference (#43360) Signed-off-by: mrjunwan-lang <mrjunwan@google.com> commit 694d9a81bbb07977e7a72a597acb44f6a848f774 Author: Nick Hill <nickhill123@gmail.com> Date: Fri May 22 00:25:10 2026 -0700 [BugFix] Fix setuptools-rust dep in requirements files (#43377) Signed-off-by: Nick Hill <nickhill123@gmail.com> commit 6bb8753db1076f498c240fffdd88b1ab983b7f40 Author: Weida Hong <wdhongtw@google.com> Date: Fri May 22 15:21:35 2026 +0800 Correcting the mock classes for MM GC tests (#43321) Signed-off-by: Weida Hong <wdhongtw@google.com> commit 025d4f5cd2617bb767663f9e7d62354039887757 Author: haosdent <haosdent@gmail.com> Date: Fri May 22 15:13:59 2026 +0800 [CI] Fix "test_awq_load[gemma4-moe-*]" failure (#43296) Signed-off-by: haosdent <haosdent@gmail.com> commit 5ea76fa89aa2e307f0d9a2e7fc19d13aed65a82f Author: haosdent <haosdent@gmail.com> Date: Fri May 22 14:24:18 2026 +0800 [CI] Fix test_lora_with_spec_decode on V2 model runner (#43314) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> commit fa1ff88b3145d1897558408a9001c030c39383b9 Author: tc-mb <157115220+tc-mb@users.noreply.github.com> Date: Fri May 22 13:44:06 2026 +0800 [Model] Fix MiniCPM-V 4.6 vit_merger qkv weight loading (#43213) Signed-off-by: tc-mb <tianchi_cai@icloud.com> commit e746a2eebf09b1f99beb6b3c60a5ba9d2f8c4875 Author: Furkan F <id+git@yufufi.com> Date: Fri May 22 07:28:23 2026 +0200 [Model] Use `AutoWeightsLoader` for Voyage (#42972) Signed-off-by: Furkan Fidan <dev@yufufi.com> commit 1fe3303983e1829fae25edfb0b93e8cbcfad96e6 Author: haosdent <haosdent@gmail.com> Date: Fri May 22 12:15:22 2026 +0800 [CI] De-flake renderers/test_hf.py::test_resolve_content_format_fallbacks[Qwen/Qwen-VL-string] (#43064) Signed-off-by: haosdent <haosdent@gmail.com> commit 8c8b1825eb26c1ffae776baaab16f2eebf92b7d3 Author: Xiaochang Wu <xiaochang.wu@intel.com> Date: Fri May 22 12:02:51 2026 +0800 [XPU] Enable multiple key kernels for sparse attention (#37888) Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> commit 18a27cc9a3641cc1dd3eae5113b75c7ccc029b5f Author: qizixi <22851944+zixi-qi@users.noreply.github.com> Date: Thu May 21 20:36:22 2026 -0700 [Bugfix] Make CuMemAllocator free callback stream-aware (#43020) Signed-off-by: zixi-qi <zixi@inferact.ai> Co-authored-by: Claude <noreply@anthropic.com> commit 0ddd7dd6564f5e403a15bd7c973c7d358ec82454 Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Thu May 21 23:33:16 2026 -0400 [Frontend] DP Supervisor (#40841) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: robertgshaw2-redhat <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> commit 60af5c16ee64ea3c1c573d67d0773a713c87a22e Author: ruizhang <rza21.bc@gmail.com> Date: Thu May 21 20:32:31 2026 -0700 [Frontend] Add truncation side to OpenAI endpoints (#43260) Signed-off-by: Rui Zhang <rza21.bc@gmail.com> Signed-off-by: Rui Zhang <rui.zhang@globalrelay.net> Co-authored-by: Rui Zhang <rui.zhang@globalrelay.net> commit 35d0141a0b68a188777e277e372f211098419f58 Author: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Date: Thu May 21 23:17:54 2026 -0400 [ROCm][CI] add warmup to mem_util test before measurement (#43236) Signed-off-by: Divakar Verma <divakar.verma@amd.com> commit 86ccef7d4400a54441057773d8ffb1f61a20af94 Author: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com> Date: Fri May 22 05:06:40 2026 +0200 [ROCm] Add XGMI backend for MoRI Connector (#41753) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> commit 2998a047aad7d48bf0399f19b36f1a4d749c59c2 Author: Chengze Fan <fancz2002@gmail.com> Date: Thu May 21 19:43:01 2026 -0700 [Bugfix] Fix DSV4 Base model swiglu limit issue in FP8 path (#42855) Signed-off-by: Chengze Fan <chengze@meta.com> Signed-off-by: Chengze Fan <fancz2002@gmail.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> commit ba369b7eb5a3c6593b55f2005655d6586997fa07 Author: Isotr0py <mozf@mail2.sysu.edu.cn> Date: Fri May 22 10:26:05 2026 +0800 [CI] Fix dockerfile dependency graph failure for pre-commit (#43378) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> commit 39910f2b25aacc09f5e7f166cdf0030b19f8b9e8 Author: Bugen Zhao <i@bugenzhao.com> Date: Fri May 22 08:21:48 2026 +0800 [Rust Frontend] Move code from `vllm-frontend-rs` (#43283) Signed-off-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Eric Curtin <eric.curtin@docker.com> Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Signed-off-by: Will.hou <1205157517@qq.com> Signed-off-by: Will.hou <willamhou@ceresman.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Eric Curtin <eric.curtin@docker.com> Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com> Co-authored-by: Will.hou <1205157517@qq.com> Co-authored-by: Will.hou <willamhou@ceresman.com> Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history. commit 39d5fa96a7c687f9ed7e14a5a52064965356cede Author: Lanze Liu <86434077+liulanze@users.noreply.github.com> Date: Thu May 21 15:42:42 2026 -0700 [Bugfix] Zero stale is_prefilling in padded CUDA graph rows for Mamba (#41873) Signed-off-by: Lanze Liu <lanzetech@gmail.com> commit 565b745ec5d28dafd14585f1b695b159ba336a04 Author: Nick Hill <nickhill123@gmail.com> Date: Thu May 21 15:42:20 2026 -0700 [BugFix] Use correct logprobs for `logprob_token_ids` (#43125) Signed-off-by: Nick Hill <nickhill123@gmail.com> commit e26e1f09280b6c54e1bc1d1fbc0118f7e309cb10 Author: fangyuchu <fangyuchu@qq.com> Date: Fri May 22 06:42:07 2026 +0800 [Feature] Add `--cpu-distributed-timeout-seconds` CLI Option for CPU Process Group Timeout (#42968) Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: zWaNg3 <389750525@qq.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 0f66623b0d739dc94afddb67863c37d6f5816579 Author: Nick Hill <nickhill123@gmail.com> Date: Thu May 21 15:36:58 2026 -0700 [Frontend] Rework fastokens integration (#43168) Signed-off-by: Nick Hill <nickhill123@gmail.com> commit 0b59fc45dd475f96f6f46f2c3e699d7bc13b3b04 Author: ylangtsou <149562838+ylangtsou@users.noreply.github.com> Date: Fri May 22 06:00:52 2026 +0800 Disable build isolation to bypass CUDA related deps for vllm-tpu (#43038) Signed-off-by: Ylang Tsou <ylangt@google.com> Co-authored-by: Ylang Tsou <ylangt@google.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> commit 17b69828a013acb7af0cd1d16d24ecc8d7582094 Author: Zheng Luo <zheluo@nvidia.com> Date: Thu May 21 13:05:01 2026 -0700 [Core] Add native ModelExpress load format (#43105) Signed-off-by: Zheng Luo <zheluo@nvidia.com> Co-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> commit b29cbf06525254693f29d98686e038eaf225be8c Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Thu May 21 16:00:29 2026 -0400 [Perf] `zeros` -> `empty` to remove additional fill (#42988) Signed-off-by: yewentao256 <zhyanwentao@126.com> commit 9b54e50e2c1c61ea3b7def032fbafc56dd3179c1 Author: Michael Goin <mgoin64@gmail.com> Date: Thu May 21 15:51:12 2026 -0400 [Deprecation] Mark env vars covered by --moe-backend / --linear-backend (#43148) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> commit 1c78f76c29a642379ad0ec953a77af9bc44376b6 Author: anish <145943060+anishesg@users.noreply.github.com> Date: Thu May 21 11:07:46 2026 -0400 [Bugfix] Add early validation to reject incompatible runner types for embedding models (#43079) Signed-off-by: anish <anishesg@users.noreply.github.com> Signed-off-by: Your Name <ak8686@princeton.edu> Signed-off-by: anish <145943060+anishesg@users.noreply.github.com> Co-authored-by: anish <anishesg@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> commit 9b9d5dbaab852a1c615fe83a7f92881d353503db Author: haosdent <haosdent@gmail.com> Date: Thu May 21 22:28:34 2026 +0800 [CI] Fix CPU tests failing on `tl.exp2` import (#43311) Signed-off-by: haosdent <haosdent@gmail.com> commit b730c4635288d75da4788bc28d8d26b5e5c3726c Author: Francesco Fusco <ffu@zurich.ibm.com> Date: Thu May 21 13:50:54 2026 +0200 [Perf] [Hybrid] Fused Triton kernel for GPU-side Mamba state postprocessing (#40172) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit c68c55d43e504745dbfc2d46b552e80acb74d4b9 Author: velonica0 <47554626+velonica0@users.noreply.github.com> Date: Thu May 21 19:50:49 2026 +0800 [CPU][RISC-V] Add VLEN=256 support to RVV attention kernels (#42943) Signed-off-by: velonica0 <like@mail.nankai.edu.cn> Signed-off-by: velonica0 <47554626+velonica0@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> commit 5ecd8e9c708821916323d25d5f7beddb7f41d22b Author: xiangdong <40376367+zxd1997066@users.noreply.github.com> Date: Thu May 21 18:41:38 2026 +0800 [XPU][CI]Fix Docker image pull-to-run race in Intel GPU CI (#43266) Signed-off-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> commit caf69823d61119ac3f4b066f20a910b62078e41c Author: haosdent <haosdent@gmail.com> Date: Thu May 21 18:38:07 2026 +0800 [CI] Pin protoc binary in rust-build stages (#43292) Signed-off-by: haosdent <haosdent@gmail.com> commit 68e07d59161a8d268b773c181fab17994a7c5d0a Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Thu May 21 04:58:09 2026 -0400 [Bug] Fix ci issue `assert output_size is not None` AssertionError (#43261) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: Isotr0py <Isotr0py@outlook.com> commit ebbfb34e3e058bd539db9e5015d0c18b7ce5a5e0 Author: Kevin H. Luu <khluu000@gmail.com> Date: Thu May 21 01:57:47 2026 -0700 [Test] Replace zephyr-7b-beta (7B) with SmolLM2-135M in tokenization test (#43085) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> commit edafea35550fab0b185b885711ec048dfd2e1a4d Author: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> Date: Thu May 21 16:17:12 2026 +0800 Fix FlashInfer TRTLLM NvFP4 monolithic MoE routing (#43223) Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com> commit b719b1635b4899e2372905def0badf96d4dd242a Author: zexplorerhj <zhjoneson@163.com> Date: Thu May 21 16:16:27 2026 +0800 Update KDA chunk prefill decay to use exp2 semantics (#43195) Signed-off-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com> Co-authored-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com> commit 0a54df28471be07b3d668ea21c5e411569d3baea Author: Kunshang Ji <kunshang.ji@intel.com> Date: Thu May 21 07:14:13 2026 +0000 [XPU] add setuptools-rust for xpu dependency (#43287) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> commit a950e9447e38727fc956afdc242bc6e3796ccb77 Author: haosdent <haosdent@gmail.com> Date: Thu May 21 14:30:14 2026 +0800 [CI] De-flake test_models for bigscience/bloom-560m (#43197) Signed-off-by: haosdent <haosdent@gmail.com> commit 050611a3dd19271a3c729788ff69b3470ccfb238 Author: Yiyang "Ian" Liu <yiyangliu@microsoft.com> Date: Wed May 20 22:58:59 2026 -0700 [Bugfix] Fix glm4_moe_tool_parser._is_string_type for /v1/responses FunctionTool format (#39601) Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: sfeng33 <4florafeng@gmail.com> commit 905b97adfaf7b08f3cc95b328579e5336ed6d3b6 Author: yzong-rh <yzong@redhat.com> Date: Thu May 21 01:13:15 2026 -0400 [Benchmark] Add num-warmup to vllm bench throughput (#43245) Signed-off-by: Yifan Zong <yzong@redhat.com> commit a6682d1d259cca69a9ae737ea5608fbbe7520031 Author: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Date: Wed May 20 21:35:08 2026 -0700 [Bugfix] Warn when renderer_num_workers has no effect on offline LLM (#42905) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> commit f2ace1d57d28df8d4c5e973dd62d87f47d628cb3 Author: Nick Hill <nickhill123@gmail.com> Date: Wed May 20 21:24:48 2026 -0700 [Frontend][RFC] Rust front-end integration (#40848) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com> commit d97ba29fdcf2538359fac5c644c0f07e59bc1988 Author: 손세정 <maze0717@g.skku.edu> Date: Thu May 21 13:24:08 2026 +0900 [ToolParser][Bugfix] Re-land: Fix anyOf/oneOf/$ref type resolution in Qwen3CoderToolParser (#37831) (#38973) Signed-off-by: AAISSJ <maze0717@g.skku.edu> Signed-off-by: <> Signed-off-by: sejung-son <sejung.son@nhn.com> Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local> Co-authored-by: sejung-son <sejung.son@nhn.com> Co-authored-by: sfeng33 <4florafeng@gmail.com> commit 6441cf4a44856f4eb4dce7d19a51fd69e1b423cf Author: Flora Feng <4florafeng@gmail.com> Date: Thu May 21 00:24:06 2026 -0400 [Refactor] Use shared coerce_to_schema_type in Seed-OSS tool parser (#43140) Signed-off-by: sfeng33 <4florafeng@gmail.com> commit 346cf163a11b55e069aa3143ae2878967393ddc2 Author: Ben Browning <bbrownin@redhat.com> Date: Thu May 21 00:23:47 2026 -0400 [Frontend] Normalize reasoning_content to reasoning for client compatibility (#42664) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 7e5070934ee5f28103c5b95cb776904a12fc36f5 Author: haosdent <haosdent@gmail.com> Date: Thu May 21 12:22:10 2026 +0800 [CI] Fix "test_vit_cudagraph_[image|video][step3_vl]" failure (#43082) Signed-off-by: haosdent <haosdent@gmail.com> commit 2b75a73b8e23f5df6de92d01a191e059424487e3 Author: Luciano Martins <22145370+lucianommartins@users.noreply.github.com> Date: Thu May 21 01:22:06 2026 -0300 [Perf][Gemma4] Batch vision encoder calls for image and video processing (#43169) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> commit e45df8c3f77572d03f638feded5b5efbccdbcc05 Author: sonusflow <git@sonusflow.pl> Date: Thu May 21 06:22:01 2026 +0200 [Bugfix] Fix Qwen3.5 GatedDeltaNet in_proj_ba Marlin failure at TP>=2 (#36329) Signed-off-by: Adi McM Sonus Flow <biuro@sonusflow.pl> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> commit ee05e8137ec48b8e7375228a1142b4c5f2e3360c Author: Jee Jee Li <pandaleefree@gmail.com> Date: Thu May 21 12:20:57 2026 +0800 [Minor] Bigger overlap for FI AR (#43103) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> commit 5d041cc1fe5181daabf39943efc7b678380d57bd Author: Louie Tsai <louie.tsai@intel.com> Date: Wed May 20 20:57:48 2026 -0700 update GPU json file based on h200 recipes (#43262) Signed-off-by: louie-tsai <louie.tsai@intel.com> commit 9640970de20b15ade9eb3859825637f64e81ed8c Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Wed May 20 21:00:30 2026 -0400 [Model Runner V2] Fix lora `Triton Error [CUDA]: device-side assert triggered` (#43139) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> commit 63ea11709bd9e9b14669e3973dff92d2dcea3cb1 Author: Ace Eldeib <alexeldeib@gmail.com> Date: Thu May 21 02:36:16 2026 +0200 [CI] Add composed-schema regression tests for DeepSeek V3.2/V4 parsers (#43255) Signed-off-by: Ace Eldeib <aeldeib@coreweave.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> commit bde560ed6e1dc889debf68410ccbcb00b749513b Author: akii96 <aakif.nawaz@amd.com> Date: Thu May 21 01:46:51 2026 +0300 [ROCm] Add QuickReduce min-size override and codec threshold (#41675) Signed-off-by: <> commit 6dc0a71843878ef45e29d4732147290b797b70fd Author: Jiangyun Zhu <riverclouds.zhu@qq.com> Date: Thu May 21 05:19:50 2026 +0800 [Misc] downgrade nvidia-cutlass-dsl to 4.5.0 (#43230) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> commit 5774aad9c5b67c5bb67bb7d306a9652a035ed0aa Author: Michael Goin <mgoin64@gmail.com> Date: Wed May 20 17:13:12 2026 -0400 [Perf][gpt-oss] Downgrade triton_kernels to v3.5.1 (#43135) Signed-off-by: mgoin <mgoin64@gmail.com> commit 452baa860b1169787cc8540a1772c4d96f682c40 Author: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com> Date: Wed May 20 16:10:44 2026 -0500 Add dllehr-amd to CODEOWNERS and committers list (#42772) Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com> commit 2a43b407c5093b1255a172139da6a5151f410b7a Author: Flora Feng <4florafeng@gmail.com> Date: Wed May 20 14:59:12 2026 -0400 [Bugfix][CI] Add missing import of pad_nvfp4_activation_for_cutlass in flashinfer (#43237) Signed-off-by: sfeng33 <4florafeng@gmail.com> commit 53ff50fcd3d2012a406e5053026ea6a46c88b2b6 Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Wed May 20 14:57:42 2026 -0400 [Perf] Optimize `CutlassFP8ScaledMMLinearKernel` when padding needed by pre-weight processing, 13.5% TTFT improvement (#42651) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> commit 363fc84407f8c966c1cee6786e45e9e6ab289684 Author: meena-at-work <80416898+meena-at-work@users.noreply.github.com> Date: Wed May 20 10:21:11 2026 -0700 Integrate flashinfer b12x MoE and FP4 GEMM kernels for SM120/121 (#40082) Signed-off-by: Meenakshi Venkataraman <meenakshiv@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> commit f2d5e3d3aeac4cb1f6d285e4a567a502ae507777 Author: haosdent <haosdent@gmail.com> Date: Thu May 21 01:00:24 2026 +0800 [CI] Lower granite-4.0-h-tiny gsm8k threshold for Hybrid SSM NixlConnector PD accuracy tests (4 GPUs) (#43186) Signed-off-by: haosdent <haosdent@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com> commit 2d6b3489b9a325988ad52507236409747d2098a7 Author: Aaron Hao <ahao@anyscale.com> Date: Wed May 20 09:07:59 2026 -0700 [R3] Add routed experts to openai entrypoint (#38939) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> commit 9c78c99995b70726f9ea929ff2e535d6303383d6 Author: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> Date: Wed May 20 19:50:24 2026 +0400 [MISC] Fix symm_mem cap-equal gate; log AR backend selection (#42993) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> commit a10d69116cb25c8137eeb3f320add71d4e04fda9 Author: Flora Feng <4florafeng@gmail.com> Date: Wed May 20 10:21:00 2026 -0400 [Bugfix] Use shared coerce_to_schema_type in DeepSeekV32 tool parser (#43019) Signed-off-by: sfeng33 <4florafeng@gmail.com> commit 644b2a28e7eb3b11191f157416cfedebd2da995b Author: Joel Smith <j.smith9103@outlook.com> Date: Wed May 20 15:10:01 2026 +0100 [Bugfix] Use enable_sm120_family for per-tensor FP8 CUTLASS kernels on SM12.1 (#41215) Signed-off-by: j9smith <j.smith9103@outlook.com> Signed-off-by: Joel Smith <j.smith9103@outlook.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> commit ded871201a424dd0d28a00aaf74c5786457a18ee Author: rishitdholakia13 <123388671+rishitdholakia13@users.noreply.github.com> Date: Wed May 20 10:08:58 2026 -0400 [Bug][Structured Outputs] Fix bug that leads to unconstrained generations with structural tags (#42452) Signed-off-by: rishitdholakia13 <rishit+github@cohere.com> Co-authored-by: Cursor <cursoragent@cursor.com> commit df84fb07a6e57969941841c6363d1efbac1ba1e8 Author: Dipika Sikka <dipikasikka1@gmail.com> Date: Wed May 20 10:01:45 2026 -0400 Remove additional dead code as a follow-up to #42889 (#43144) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> commit 0a508743d42a26786c1432bb7f2e93f8111b6383 Author: Benjamin Chislett <bchislett@nvidia.com> Date: Wed May 20 09:15:52 2026 -0400 [Spec Decode] Support non-MTP speculation for NemotronH (#43130) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> commit 19cf334207ed81d3ed75a473acd1a95c785d9ed3 Author: Kebe <mail@kebe7jun.com> Date: Wed May 20 21:58:30 2026 +0900 [Feature] Support manually enabling the cumem allocator (#33648) Signed-off-by: Kebe <mail@kebe7jun.com> commit 87e31455b056c6ce59bf5dcb3c622155431851db Author: Ray Wang <roguerui6@gmail.com> Date: Wed May 20 02:32:03 2026 -0700 [Doc] Sync CLI guide with actual help modes and launch subcommand (#40326) Signed-off-by: Rui Wang <raygorous@gmail.com> Co-authored-by: Rui Wang <raygorous@gmail.com> commit cb600d1cdbb079ab9432348f128e71c4e2e0a373 Author: hallerite <git@hallerite.com> Date: Wed May 20 10:58:46 2026 +0200 [Frontend] Forward X-data-parallel-rank header on /inference/v1/generate (#42330) Signed-off-by: hallerite <git@hallerite.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> commit 6f21558da1ec7362d2b4f3d012bce2b612a74459 Author: xiangdong <40376367+zxd1997066@users.noreply.github.com> Date: Wed May 20 16:54:58 2026 +0800 [XPU][CI] Add 2 server model test files in Intel GPU CI (#42499) Signed-off-by: zengxian <xiangdong.zeng@intel.com> commit 1cb224430bea0d037b57e24cf91001f47b69ddf3 Author: Artem Perevedentsev <aperevedents@nvidia.com> Date: Wed May 20 11:46:55 2026 +0300 [GDN] Enable FI Blackwell GDN prefill kernel (#40717) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> commit 9b343dd4f54a9870f3ba1e41f5a5b3f4a1e25340 Author: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Wed May 20 17:10:00 2026 +0900 Enable mermaid diagrams in the docs (#43192) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit 07aeaf9d4df870a76d5a0dc19d6a7e74b4be5d3b Author: Chris Leonard <chleonar@redhat.com> Date: Wed May 20 03:18:12 2026 -0400 [6/n] Migrate activation kernels, gptq, gguf, non cutlass w8a8 to libtorch stable ABI (continued) (#42663) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Signed-off-by: Chris Leonard <chleonar@redhat.com> Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> commit 40651c020772b80f9ca80272aebe749fe01cd38a Author: Nicolò Lucchesi <nlucches@redhat.com> Date: Wed May 20 09:02:36 2026 +0200 [Docs][PD][NIXL] Bidirectional kv-cache transfer (#43097) Signed-off-by: NickLucche <nlucches@redhat.com> commit 7e4bc2cecb3a8aede2d10c86a3a1a4bd98e26100 Author: Nicolò Lucchesi <nlucches@redhat.com> Date: Wed May 20 08:58:25 2026 +0200 [Docs][PD][NIXL] Lease extension mechanism for blocks on P (#43099) Signed-off-by: NickLucche <nlucches@redhat.com> commit 85959567c3e71a9965616ebebe1853ca48d8d20f Author: Kevin H. Luu <khluu000@gmail.com> Date: Tue May 19 23:01:41 2026 -0700 [ci] Revert model executor test back to L4 (#43188) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> commit 4f940896a32c9e2a0eba7f50d521bf5f6b4de458 Author: Ronen Schaffer <ronen.schaffer@ibm.com> Date: Wed May 20 06:32:08 2026 +0300 [KV Offload] Pass `OffloadingSpec` instead of `VllmConfig` to secondary tiers (#43076) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> commit cd0ff26e7acf2c691a33d4c44276db6980bab24b Author: Michael Goin <mgoin64@gmail.com> Date: Tue May 19 23:21:01 2026 -0400 [CI] Add DSV4-Flash to gsm8k moe-refactor/config-b200.txt (#42111) Signed-off-by: mgoin <mgoin64@gmail.com> commit 2ae910ed88121d7c3acdcb9bab14cd968257b6e6 Author: Izik Golan <47969623+izikgo@users.noreply.github.com> Date: Wed May 20 06:16:07 2026 +0300 [Perf] Avoid forward scan for async output placeholders (#42938) commit fadf5d332c6e9bb6e552c1ca529511bce0f79802 Author: pmaybank <113125070+pmaybank@users.noreply.github.com> Date: Tue May 19 23:16:02 2026 -0400 add enqueue all option to throughput benchmark (#42975) Signed-off-by: Philip Maybank <pmaybank@amd.com> Signed-off-by: pmaybank <113125070+pmaybank@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit c628a93a64fb4929c3c11d8e2c7244c4826b4f76 Author: Benjamin Chislett <bchislett@nvidia.com> Date: Tue May 19 23:15:57 2026 -0400 [Perf][Bugfix] Update dflash aux layer indexing (#40727) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> commit 5774aaed0cbeaa74ca7a75d372c1e8bd4aa11cdb Author: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com> Date: Tue May 19 22:32:06 2026 -0400 [Cohere] Enable Cohere MoE (#43143) Signed-off-by: Terrencezzj <terrence@cohere.ai> commit 39bba710bed5b6018718af3e0fd7984f6082118e Author: Nick Hill <nickhill123@gmail.com> Date: Tue May 19 19:19:05 2026 -0700 [MRV2][BugFix] Fix default-stream CG capture in P/W LoRA case (#43160) Signed-off-by: Nick Hill <nickhill123@gmail.com> commit 73dd2f33b7a5a8a237fe7296039cec246e4c68bd Author: Aaron Hao <ahao@anyscale.com> Date: Tue May 19 18:01:29 2026 -0700 [bug] fix WeightTransferConfig.backend to allow for all strings (#43121) Signed-off-by: ahao-anyscale <ahao@anyscale.com> commit be16785998087f80ffac08b980603241e5da16ab Author: Fadi Arafeh <115173828+fadara01@users.noreply.github.com> Date: Wed May 20 00:31:15 2026 +0100 [CPU][DOC] Fix installation commands for Arm CPUs (#43115) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> commit 117afeea4665367a3066c1df58d4082d07fcc946 Author: Max de Bayser <mbayser@br.ibm.com> Date: Tue May 19 17:27:54 2026 -0400 Fix error in Dynamic NTK scaling (#41277) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> commit 12421962955ac28b6f80a0307f554fad939174dd Author: Doğaç Eldenk <dogacel@gmail.com> Date: Tue May 19 15:39:00 2026 -0500 [Model] Support post-norm architecture for EAGLE-3 supeculators (#42764) Signed-off-by: Doğaç Eldenk <dogacel@gmail.com> commit a65093c1a39a8ddd8455365128ecbe259350e22c Author: Kevin H. Luu <khluu000@gmail.com> Date: Tue May 19 11:51:34 2026 -0700 [ci] Move language models tests (hybrid) back to L4 (#43129) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> commit 9aaf83ef502fc37bc647f6e474314d48ba36cd1c Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com> Date: Tue May 19 14:44:32 2026 -0400 [CI failure] Temporarily disable using persistent cache for flashinfer autotune (#43119) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> commit f54721bcc3e072d71b0e09c0b0bd6d692eb06161 Author: tomeras91 <57313761+tomeras91@users.noreply.github.com> Date: Tue May 19 21:43:04 2026 +0300 [Bugfix][MoE] FlashInfer one-sided: workspace union across heterogeneous layers (#42976) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> commit aed2eb355a9d9136c8e17690b932983b55fb343f Author: Dao007forever <dao007forever@gmail.com> Date: Tue May 19 11:14:43 2026 -0700 [Docs] Fix MooncakeStoreConnector role in disaggregated example (#42994) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> commit d247a931cc25e7253feccbd6260d48216ff5c081 Author: Dom Brown <3886319+DomBrown@users.noreply.github.com> Date: Tue May 19 17:02:05 2026 +0100 [feat] Add FP8 per-tensor Q scale support to Triton attention backend (#42080) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> commit 8200fbe1ac73f00a46b1cdd6c4c93bdaf2c33022 Author: Jinzhen Lin <jinzhen.ljz@antgroup.com> Date: Tue May 19 23:36:47 2026 +0800 [Misc] add humming to dependencies (#42540) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> commit 42b4f1fdf7269de8aa83755a805555fe78add28b Author: Flora Feng <4florafeng@gmail.com> Date: Tue May 19 11:21:12 2026 -0400 [Refactor] Extract extract_types_from_schema utility from Minimax M2 tool parser (#43025) Signed-off-by: sfeng33 <4florafeng@gmail.com> commit 1c6158083a6fc3aff408660d2defd7602f78f556 Author: Wang Yiwen <121547057+yiwen101@users.noreply.github.com> Date: Tue May 19 23:17:42 2026 +0800 [Model] Openvla support (#42654) Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com> commit d740e2c02919cfba5a86a40d1c12439d03f5ac07 Author: Xinyu Chen <xinyu1.chen@intel.com> Date: Tue May 19 23:09:07 2026 +0800 [XPU] update xpu graph usage (#43043) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> commit b82e908b4c65a1f162e2d35a8106f09d95d8aa02 Author: Nick Hill <nickhill123@gmail.com> Date: Tue May 19 07:35:54 2026 -0700 [Perf][4/n] Eliminate various GPU<->CPU syncs (#42347) Signed-off-by: Nick Hill <nickhill123@gmail.com> commit a78b842d0e85d287176031334f4721cd96b6e47d Author: Sage <80211083+sagearc@users.noreply.github.com> Date: Tue May 19 13:21:49 2026 +0300 [Bugfix] Fix top logprobs token placeholders in `/inference/v1/generate` (#42887) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> commit 129019f3342f1b7346ed8f4c1ac9fdefd8fe6ef8 Author: zhanqiuhu <49648934+ZhanqiuHu@users.noreply.github.com> Date: Tue May 19 05:44:33 2026 -0400 [CI] Add MTP + PD disagg test for Qwen3.5 (#42677) Signed-off-by: ZhanqiuHu <zhu@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> commit ef54a4d604ef3725bd52aa2893f71d671bf5329a Author: Shanshan Shen <467638484@qq.com> Date: Tue May 19 16:43:16 2026 +0800 [Misc][MM] Remove redundant code in CLIPAttention (#43046) Signed-off-by: shen-shanshan <467638484@qq.com> commit 07beaed8422d2df34a20e8ebd22b7924d563a566 Author: Woosuk Kwon <woosuk.kwon@berkeley.edu> Date: Tue May 19 01:12:46 2026 -0700 [Model Refactoring] Rename deepseek_v4.py to model.py [4/N] (#43077) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> commit 056bc2e16646599a96ac94e761c953e680e6fba9 Author: Yifan Qiao <yifanqiao@inferact.ai> Date: Tue May 19 01:07:46 2026 -0700 [KVConnector][DSV4] HMA support for Mooncake store connector (#42828) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> commit f34623bf3cac5b33451a761e802c9531e83d1c68 Author: Aaron Hao <ahao@anyscale.com> Date: Tue May 19 01:06:21 2026 -0700 [bug] AsyncScheduler drops first post-resume token after pause_generation + clear_cache (#42117) Signed-off-by: hao-aaron <ahao@anyscale.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit b14be81c1f63b70668d26d65a377b6383fbca936 Author: Woosuk Kwon <woosuk.kwon@berkeley.edu> Date: Tue May 19 00:52:54 2026 -0700 [Model Refactoring] Move deepseek_v4_ops to models/deepseek_v4 [3/N] (#43073) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> commit 301d986473a0ffc1df563422e01eac4a1efd59e0 Author: wang.yuqi <yuqi.wang@daocloud.io> Date: Tue May 19 15:37:40 2026 +0800 [Frontend] Consolidate beam search by BeamSearchMixin. (#42946) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> commit 257af77bc2b612d5ebd0aecea777139036543af3 Author: wang.yuqi <yuqi.wang@daocloud.io> Date: Tue May 19 14:43:18 2026 +0800 [Docs] Reorganize online serving docs. (#41907) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit 4a4fdabe28f3e2c8f9d05bcc80c4bf6d656b1ead Author: Taneem Ibrahim <taneem.ibrahim@gmail.com> Date: Tue May 19 01:16:42 2026 -0500 [Misc] Aligning tokwise pooler heads for consistency (#43041) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> commit f1e3f0e6d685082bdb313c20914099ac5ede5f14 Author: Chaojun Zhang <chaojun.zhang@intel.com> Date: Tue May 19 14:14:59 2026 +0800 [XPU] Use custom op collective behavior (#41354) Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> commit 9fd8487d2f56468aeec8154123641eb7c2eeacdf Author: Gracie Guo (UX) <114208705+gracie-guo@users.noreply.github.com> Date: Tue May 19 13:50:38 2026 +0800 [Docs] Add SVG images for pooling models. (#42626) Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> commit 27f4ba94811ef14bd45bcdc0c0b8e288a7cc6bc6 Author: Junyan Xu <junyanxu5513@gmail.com> Date: Mon May 18 22:29:04 2026 -0700 fix: use keyword arguments for shard_id and expert_id in weight_loade… (#42671) Signed-off-by: junyanxu <junyanxu5513@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 6e889b582b6a0b11f22b3764be174266faa9ff5e Author: Kevin H. Luu <khluu000@gmail.com> Date: Mon May 18 21:58:36 2026 -0700 [ci] Route 28 gpu_1_queue tests to h200_35gb queue (#43030) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> commit fab07e4d0f7f266643c6ac0dc944f9f433ef2140 Author: Qiuyang Yue <yueqiuyang1389@gmail.com> Date: Mon May 18 21:22:33 2026 -0700 [Bugfix][KV Connector] Fix SimpleCPUOffloadScheduler TOCTOU between Phase A and Phase B (#42289) Signed-off-by: Qiuyang Yue <yueqiuyang1389@gmail.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: gemini-code-assist <noreply@google.com> commit 3ca8db2ef88ec5a6686e62ee3ac899afae85c7af Author: gnovack <gnovack@amazon.com> Date: Mon May 18 21:17:56 2026 -0700 add cutedsl dsv4 indexer fp8 kernel (#42899) Signed-off-by: george <george@inferact.ai> Co-authored-by: george <george@inferact.ai> commit 87b08c5f6460cf487e47872c5fbc2595c97e74ef Author: Woosuk Kwon <woosuk.kwon@berkeley.edu> Date: Mon May 18 21:00:58 2026 -0700 [Model Refactoring] Move DeepSeek V4 layers to `models/deepseek_v4/` [2/N] (#43039) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> commit fba010dd74e2f94e4f7223b164ec9097d1b8a6af Author: Nicolò Lucchesi <nlucches@redhat.com> Date: Tue May 19 05:25:41 2026 +0200 [Bugfix][MRV2] Fix KVCache tensor explicit `kernel_block_size` dim (#42766) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> commit da03e549b34685c4e63a091e973d907aee48a68c Author: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Date: Tue May 19 11:25:37 2026 +0800 [UX] Add a persistent cache for FlashInfer autotuning (#42537) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> commit 36dcaf25d8e091ea0f47b9ce7dcfca05de56f16d Author: Kunshang Ji <kunshang.ji@intel.com> Date: Tue May 19 03:17:09 2026 +0000 [XPU] add gptq(int4) support (#37844) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> commit 8f16c4a5c0feb01f106e5981f22ae8808a94a28b Author: Ofir Zafrir <ofir.zafrir@intel.com> Date: Tue May 19 06:16:07 2026 +0300 [BugFix][CPU][Spec Decode] Fix Eagle implementation on CPU backend (#42468) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com> commit afd7b1dce94fed484351fafd5bf5ea6601ac621e Author: Revital Sur <eres@il.ibm.com> Date: Tue May 19 06:12:04 2026 +0300 [Bugfix] Use platform-agnostic device in example_connector load (#42926) Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> commit 287471b99442b44c5a16c4d70b0f3e178dd52732 Author: Woosuk Kwon <woosuk.kwon@berkeley.edu> Date: Mon May 18 19:50:02 2026 -0700 [Model Refactoring] Migrate DeepSeek V4 to vllm/models/ [1/N] (#43004) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> commit 239b5ff30cf46f9196149c888a20be2096fdff03 Author: Michael Goin <mgoin64@gmail.com> Date: Mon May 18 20:22:27 2026 -0400 [Frontend] Add --spec-method/--spec-model/--spec-tokens CLI aliases (#42476) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> commit f85c76d701fc049a722c17b3affd9401380be1bf Author: Artem Perevedentsev <aperevedents@nvidia.com> Date: Tue May 19 02:58:15 2026 +0300 [CI/Build] Bump nvidia-cutlass-dsl to 4.5.1 (#42991) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> commit a171e6b52dff47dc567657e7d51f641bdcb22774 Author: shanjiaz <zsjwpianpian@gmail.com> Date: Mon May 18 19:39:09 2026 -0400 Add parallel drafting to v2 model runner unsupported features (#43010) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> commit 37ece593c105b5bb818aa94885617b863d390d7f Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Mon May 18 19:38:12 2026 -0400 [Perf] Padded nvfp4 quant kernel to remove additional copy, 2.4%~5.7% e2e performance improvement (#42774) Signed-off-by: yewentao256 <zhyanwentao@126.com> commit 57fef4e0bf0bfaddf117dfdc9367e1fb957b423f Author: Flora Feng <4florafeng@gmail.com> Date: Mon May 18 17:55:39 2026 -0400 [Refactor] Extract shared coerce_to_schema_type utility from Minimax M2 tool parser (#43006) Signed-off-by: sfeng33 <4florafeng@gmail.com> commit 0191354827560fe38f68b4e7207f8824d6152ca3 Author: haosdent <haosdent@gmail.com> Date: Tue May 19 05:29:10 2026 +0800 [Perf][MLA] Enable FULL cudagraph capture for TRITON_MLA decode (#42885) Signed-off-by: haosdent <haosdent@gmail.com> commit cd49a05d5aa3cc296912297b3c2b577efe4183c8 Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Mon May 18 16:41:22 2026 -0400 [Refactor] Remove dead code (#42889) Signed-off-by: yewentao256 <zhyanwentao@126.com> commit 84747489ded65265ee7d43815bfa3373b0d42279 Author: Ronen Schaffer <ronen.schaffer@ibm.com> Date: Mon May 18 22:41:58 2026 +0300 Tier offload followup (#42529) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> commit 8fc1c284b94668b60c30737e178cb7e6cd651e89 Author: Tuukka Sarvi <tuukka.sarvi@amd.com> Date: Mon May 18 21:56:22 2026 +0300 [ROCm] Guard AITER GDN decode fast path by layout (#42880) Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com> commit ce88f01c9ac4fcde9dd43a983074d4e893cde65d Author: Amit Portnoy <1131991+amitport@users.noreply.github.com> Date: Mon May 18 21:22:56 2026 +0300 [Docs] update attribution to reflect EDEN foundation (#41666) Signed-off-by: amitport <1131991+amitport@users.noreply.github.com> commit 00e20e76f775b88f47469ae9fcb0f1ecd7580bb9 Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Mon May 18 14:14:21 2026 -0400 [Refactor] Remove dead cuda kernels (#42767) Signed-off-by: yewentao256 <zhyanwentao@126.com> commit 9758a6e5c5a556275c030db456d5d434ee999d58 Author: czhu-cohere <conway.zhu@cohere.com> Date: Mon May 18 11:12:06 2026 -0700 [BugFix] support PP for Cohere vision model (#42819) Signed-off-by: <conway.zhu@cohere.com> Signed-off-by: root <conway.zhu@cohere.com> commit a2c8fc66573664395f491a94da1882fdf92e034b Author: Bowen Bao <bowenbao@amd.com> Date: Mon May 18 10:46:13 2026 -0700 [ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle (#41436) Signed-off-by: Bowen Bao <bowenbao@amd.com> commit 6859ca76159fdd403b687c0c296e5a12850ba24e Author: Jinzhen Lin <jinzhen.ljz@antgroup.com> Date: Tue May 19 01:32:26 2026 +0800 [Bugfix] fix swiglu limit issue for humming backend + deepseek v4 (#42541) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> commit 67f58ce23f469e118688a50687ef0fbb14a1c028 Author: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Date: Tue May 19 01:02:01 2026 +0800 [Bugfix] Fix DSV4 MTP after ROCm mHC integration (#42930) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> commit 8c296de63b47664fc5979831e1ae2d2a14a05b1a Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com> Date: Mon May 18 12:12:27 2026 -0400 [Perf] Re-enable flashinfer autotune by default and cleanup (#42857) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> commit b12745e4f31ffacf401cc20a97c592d6a49f3269 Author: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Tue May 19 00:56:09 2026 +0900 Fix `--convert` passed without `--runner` on causal models (#42935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit e26736973a1981dbb4054dc1ac430e78d8006ef2 Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Date: Mon May 18 11:27:21 2026 -0400 [Model Runner V2] Fix prompt logprobs calculation `Sizes of tensors must match` error (#42778) Signed-off-by: yewentao256 <zhyanwentao@126.com> commit 47829b1159335a010521ea3e5361d51744a36b0a Author: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Date: Mon May 18 18:26:00 2026 +0300 [Bugfix] mamba: run single-token extends as decodes (#42430) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> commit 4a39b4f55374d48ebaa2ca02312e24639db8e0b8 Author: Blanc Swan <85233612+blancsw@users.noreply.github.com> Date: Mon May 18 17:20:04 2026 +0200 [Model] Add Apertus Tool Parser (#41154) Signed-off-by: Blanc <swan.blanc@infomaniak.com> commit 78e7a7b9b0b9c285bf6978c3fc09eeecea3ff230 Author: Siddharth Bedekar <104613085+bedeks@users.noreply.github.com> Date: Mon May 18 08:02:43 2026 -0700 Refactor AWQ Marlin MoE onto modular WNA16 oracle (#42483) Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com> Signed-off-by: Siddharth Bedekar <104613085+bedeks@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit f5d3dc7115cf77472ba5e274f6becbbeddbf4bd5 Author: Michael Goin <mgoin64@gmail.com> Date: Mon May 18 10:26:07 2026 -0400 [Model Runner v2] Support update_config (#42783) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> commit 1ac10f159a09897baada01b14b6a0dd6442aefd6 Author: vllm-agent <claw@inferact.ai> Date: Mon May 18 06:02:51 2026 -0700 Revert "[torch.compile] Add patch for fullgraph compilation" (#42686) (#42913) Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Zhewen Li <zhewenli@inferact.ai> commit e5417657e55ec2f42809816e4aa5c9753f390cdd Author: liranschour <liranschour@users.noreply.github.com> Date: Mon May 18 15:59:42 2026 +0300 [KV Connector][Offloading] Flush all pending jobs on last step (#42611) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 2e40faf08b2cae4ff6e27a255fe10833365de0e8 Author: xiangdong <40376367+zxd1997066@users.noreply.github.com> Date: Mon May 18 20:34:48 2026 +0800 [XPU][CI] Temporarily skip test_moe_lora_align_block_size_mixed_base_and_lora[1] in Intel GPU CI (#42954) Signed-off-by: zengxian <xiangdong.zeng@intel.com> commit 69c91d010a596bb74b553fe157497a1fd6edb47c Author: Nicolò Lucchesi <nlucches@redhat.com> Date: Mon May 18 14:34:16 2026 +0200 [MRv2] Default to MRv1 when a connector is present (#42955) Signed-off-by: NickLucche <nlucches@redhat.com> commit 737bfa3a43ce386bd1894792f3302d9f3f9d73fa Author: roikoren755 <26850796+roikoren755@users.noreply.github.com> Date: Mon May 18 14:54:00 2026 +0300 [Bugfix][Hybrid][NemotronH] Fix mamba_cache_mode=all + speculative decoding crash (#41233) Signed-off-by: Roi Koren <roik@nvidia.com> commit e414e1f1c020108593526b706efaf89e427c05a2 Author: Kfir Toledo <kfir.toledo@ibm.com> Date: Mon May 18 14:36:02 2026 +0300 [Bugfix][KV Offload] count appended GPU blocks in store group_sizes (#42945) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> commit df852ed503ac1a79e568271cd6f136a7b2698f5e Author: inisis <desmond.yao@buaa.edu.cn> Date: Mon May 18 18:33:29 2026 +0800 fix: remove unused norm for dpskv4 (#41710) Signed-off-by: inisis <desmond.yao@buaa.edu.cn> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> commit 88a860d7545aad69661daad7a1c2b04f59c76144 Author: Yuwen Zhou <yuwen.zhou@intel.com> Date: Mon May 18 18:04:45 2026 +0800 [CPU] Add MXFP4 W4A16 MoE support (#41922) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Yuwen Zhou <yuwen.zhou@intel.com> commit cac81b6eda418fb5ca86b81197914dd02666353e Author: Tianmu Li <tianmu.li@intel.com> Date: Mon May 18 03:04:41 2026 -0700 [CPU Backend] Improve cpu thread utilization (#42666) Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit b4601ad43ff7ff2b9e2f52379144481e45bcf6c5 Author: Li, Jiang <jiang1.li@intel.com> Date: Mon May 18 18:04:36 2026 +0800 [CPU] Add fused GDN support for AMX CPU platform (#42707) Signed-off-by: jiang1.li <jiang1.li@intel.com> commit 2267f70070bdee8057b4afae69cba9b847add587 Author: Jee Jee Li <pandaleefree@gmail.com> Date: Mon May 18 18:04:31 2026 +0800 [Kernel] Pack topk id/weights triton kernel (#42527) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> commit 965d076148326f4511b6b832cbe7d974db74dbe9 Author: Tony Lin <tony.lin@intel.com> Date: Mon May 18 17:38:54 2026 +0800 [CPU] Specify required KV cache layout for CPU attention backend (#42740) Signed-off-by: Tony Lin <tony.lin@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> commit c38bed4248e97e5ed981569777d035d31ace5368 Author: wenjun liu <wenjun.liu@intel.com> Date: Mon May 18 16:36:45 2026 +0800 delete xpu ci (#42582) Signed-off-by: wenjun.liu <wenjun.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> commit 998714b21b413c78db8eb7af7f384dc90c0b10dc Author: Xin Yang <105740670+xyang16@users.noreply.github.com> Date: Mon May 18 01:32:46 2026 -0700 [Perf] Add do_not_specialize in fused FP8 RoPE kernel (#42849) Signed-off-by: Xin Yang <xyangx@amazon.com> commit 9537542537728af9fac418ecf1604ad8e8d9ff93 Author: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Mon May 18 17:31:06 2026 +0900 Revert checkpoint specific workaround in Transformers modelling backend (#42923) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit 5ab6d1b3fd407404cd78488bf6f4cbcde6d912b7 Author: Rishapveer Singh <singhrishapveer@gmail.com> Date: Mon May 18 10:14:36 2026 +0200 [Model] [Perf] Use flatten for Qwen3.5's GDN output projection (#42311) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com> commit 7d5b033782681acee274f4f379c9fadc557fd7e8 Author: Jee Jee Li <pandaleefree@gmail.com> Date: Mon May 18 15:22:26 2026 +0800 [LoRA] Support 2D and 3D MoE LoRA adapter at the same time (#42242) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> commit e3aeee5ff8bf7e89fea231d2a965701248eb43c0 Author: Nguyễn Thế Duy <nduy250299@gmail.com> Date: Mon May 18 14:17:53 2026 +0700 [Bugfix] moe lora align kernel grid (#40131) Signed-off-by: TheDuyIT <nduy250299@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Signed-off-by: dtnguyen <dtnguyen@nvidia.com> Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> commit c1f7854342d1e80f7f2406524d242b8ee5476d6d Author: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Mon May 18 15:33:32 2026 +0900 Improve logging when docs build is skipped (#42929) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit 23c15acd770cf16ed36c6d3fed8e7d78db7d5282 Author: gaozihao-shy <gaozihao3@huawei.com> Date: Mon May 18 13:07:16 2026 +0800 [BugFix] Kimi-K2.5: skip vision tower dtype conversion when using quantization (#42869) Signed-off-by: gaozihao-shy <gaozihao-shy@users.noreply.github.com> Signed-off-by: gaozihao <gaozihao3@huawei.com> commit b50646e5effd7cb5884cd96fdff4c53c18521198 Author: Andreas Karatzas <akaratza@amd.com> Date: Sun May 17 22:57:59 2026 -0500 [ROCm][CI] Stabilize ROCm pooling and multimodal CI (#42909) Signed-off-by: Andreas Karatzas <akaratza@amd.com> commit 990f49bdcb8ff51c0ceb1d784c3ca16e6c276927 Author: Soyaazz <523420504@qq.com> Date: Mon May 18 11:19:13 2026 +0800 [MM][CG] Enable encoder Cudagraph for Step3VL (#42224) Signed-off-by: JisoLya <523420504@qq.com> Signed-off-by: Soyaazz <523420504@qq.com> commit 107210442da1bc6985bfa615b55e1e5c2dd98958 Author: Alec <35311602+alec-flowers@users.noreply.github.com> Date: Sun May 17 19:11:46 2026 -0700 [CI] Add NIXL EP import canary (#42567) Signed-off-by: Alec Flowers <aflowers@nvidia.com> Co-authored-by: OpenAI Codex <codex@openai.com> commit 03ddc1c9bc5e448e0da6236268a611d7d001dbae Author: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com> Date: Mon May 18 09:57:04 2026 +0800 [Perf] Wire silu_and_mul_per_block_quant into TritonFP8MoE (MiniMax-M2) (#42497) Signed-off-by: qianlihuang <yiliu.dong@qq.com> Signed-off-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com> Co-authored-by: qianlihuang <yiliu.dong@qq.com> commit 966903eb93a053a908fbf8b931fcebfb28c4741a Author: Luka Govedič <ProExpertProg@users.noreply.github.com> Date: Sun May 17 15:49:16 2026 -0400 [torch.compile] Add patch for fullgraph compilation (#42686) Signed-off-by: Luka Govedič <luka.govedic@gmail.com> commit 599e75f432e5fd7c77e65dc95587f3441201bdbc Author: TJian <tunjian.tan@embeddedllm.com> Date: Mon May 18 00:18:50 2026 +0800 [ROCm] [Bugfix] Fix DeepSeek V4 Functionality and Accuracy (#42810) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> commit 1c8e9c0399f6a6a98f406dce5947a2ad318e195a Author: Taneem Ibrahim <taneem.ibrahim@gmail.com> Date: Sun May 17 09:40:21 2026 -0500 Refactor: Pass num_labels explicitly to PoolerClassify instead of reading from global config (#42851) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> commit 0fa888465e5a30b797bdf2cdcd0f57fc77541cef Author: zofia <110436990+zufangzhu@users.noreply.github.com> Date: Sun May 17 16:55:10 2026 +0800 [XPU] fix weight scale shape (#42725) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> commit ff712f6447093d07747c88680b9d006b119f5890 Author: liuzhenwei <zhenweiliu@habana.ai> Date: Sun May 17 12:15:50 2026 +0800 [MRV2][XPU] add Model Runner V2 log (#42710) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> commit 504a26ce2be2415118b73966480b4fc04d9b7bf8 Author: Qi Zhou <qizzzh@google.com> Date: Sat May 16 17:54:58 2026 -0700 Support bf16 for mamba ssm cache (#41680) Signed-off-by: Qi Zhou <qizzzh@google.com> commit a94189295b8b9c1d952be438b49ed5793db59159 Author: weizhoublue <45163302+weizhoublue@users.noreply.github.com> Date: Sun May 17 08:54:27 2026 +0800 Fix Weight loading for Qwen3.5-MTP and Qwen3-VL using runai_streamer (#42716) Signed-off-by: weizhoublue <weizhou.lan@daocloud.io> commit 0867497368f390212a3f9684e2e05f698f8d1149 Author: Artem Perevedentsev <aperevedents@nvidia.com> Date: Sun May 17 00:55:12 2026 +0300 [CI/Build] Bump flashinfer to v0.6.11.post2 (#41711) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com> Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> commit 36e74c9ea4feb5ade38ffa1ea96f24dd73316e02 Author: Zhewen Li <zhewenli@meta.com> Date: Sat May 16 13:34:15 2026 -0700 [KV Connector] Support disk offloading in MooncakeStoreConnector (#42689) Signed-off-by: Zhewen Li <zhewenli@inferact.ai> Co-authored-by: Zhewen Li <zhewenli@inferact.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> commit 787bc0d0313840c16e403dfa2d135781d41d3614 Author: Taneem Ibrahim <taneem.ibrahim@gmail.com> Date: Sat May 16 14:58:16 2026 -0400 Add unit tests for pooler activation functions (#42824) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> commit d1586e1a1242754d2f6ac51f4f16680f7d4b129b Author: weizhoublue <45163302+weizhoublue@users.noreply.github.com> Date: Sun May 17 01:02:54 2026 +0800 Fix: Propagate pinned model revisions into Ultravox secondary weight loading (#42830) commit 8a56da3845270837424ef4b7ee83ca97a7883025 Author: Jiangyun Zhu <riverclouds.zhu@qq.com> Date: Sat May 16 22:04:12 2026 +0800 [Experimental] Breakable CUDA graph (#42304) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> commit 4db300e95fd29f5b1a4a7c34f4fbe91b7e9abb24 Author: Andreas Karatzas <akaratza@amd.com> Date: Sat May 16 04:35:05 2026 -0500 [ROCm][CI] Removed problematic command override mechanism (#42807) Signed-off-by: Andreas Karatzas <akaratza@amd.com> commit 657b42b5922d21fef00529144ef5bb5633ad04b1 Author: Zhewen Li <zhewenli@meta.com> Date: Sat May 16 00:26:25 2026 -0700 [Docker][KVConnector] Build mooncake-transfer-engine from source (#42114) Signed-off-by: Zhewen Li <zhewenli@inferact.ai> Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Zhewen Li <zhewenli@inferact.ai> Co-authored-by: khluu <khluu000@gmail.com> commit 32b7177909d1c9928bcedd81de7de5a1fa21d2b3 Author: Jee Jee Li <pandaleefree@gmail.com> Date: Sat May 16 11:22:35 2026 +0800 [LoRA][Bugfix] Dedup LoRA wrapping for modules referenced from multiple attribute paths (MoE gate) (#42757) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> commit 39c67d714ef091df1533181bdc3df82dc9ac3e07 Author: DustHunter <dusthunter@126.com> Date: Sat May 16 09:29:27 2026 +0800 fix: add API key authorization to /v2 endpoints (#42594) Signed-off-by: DustHunter <dusthunter@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> commit 87a2adcb43513ead1434aff03a535d86f56f768b Author: Viktor Pus <viktorpus@tenstorrent.com> Date: Sat May 16 02:44:48 2026 +0200 [Misc] Add common random prefix option to structured-output serving benchmark (#41632) Signed-off-by: Viktor Pus <viktorpus@tenstorrent.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> commit 852f567444cf8c206219edb7b2c42aec55fc41cf Author: Michael Goin <mgoin64@gmail.com> Date: Fri May 15 20:15:52 2026 -0400 [Bugfix] Respect explicit --kv-cache-dtype over checkpoint kv_cache_scheme (#42782) Signed-off-by: mgoin <mgoin64@gmail.com> commit b2a27b82d970efa0203c06be6dc0d94526edaab0 Author: Michael Goin <mgoin64@gmail.com> Date: Fri May 15 20:07:39 2026 -0400 [Kernel][UX] Add `--linear-backend` arg for linear kernel selection (#39538) Signed-off-by: mgoin <mgoin64@gmail.com> commit d0921bafeff9bbe7a7b4efef6371700e69224702 Author: Keyi Li <94494390+JasonKeyiL@users.noreply.github.com> Date: Fri May 15 16:20:33 2026 -0700 [Bugfix] Unwrap VLM wrappers for EPLB on Model Runner V2 (#42706) commit 1ccdf87507407cb02460ec2e7a3e1a4cac9b0a4a Author: rasdani <73563550+rasdani@users.noreply.github.com> Date: Fri May 15 15:20:53 2026 -0700 [Bugfix] Fix layerwise reload alias-buffer corruption (#42481) Signed-off-by: rasdani <73563550+rasdani@users.noreply.github.com> Co-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: Roger Wang <hey@rogerw.io> commit bd9dbe60601c986b50260f299fe279d057d7d89f Author: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com> Date: Fri May 15 13:50:03 2026 -0700 [ROCm][Bugfix] Fix fused_mla_dual_rms_norm for AITER API rename _fused_qk_rmsnorm (#42606) Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com> commit de2d76f35239c58202e49469dc5524b6f6fc4ffb Author: Michael Goin <mgoin64@gmail.com> Date: Fri May 15 16:46:16 2026 -0400 [Build] Switch CUDA 12.9 wheel builds to PyTorch manylinux_2_28 base (#41668) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> commit 9a7a273dfe6a89bbe00639fe99b0d61095fbc40a Author: Sergei Skvortsov <yvorott@gmail.com> Date: Fri May 15 21:01:21 2026 +0100 Add HumanEval and GSM8K benchmarks to datasets (#42648) Signed-off-by: southfreebird <yvorott@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> commit b2c58ee9427f15563210e184c57a6e530f37e464 Author: Lanze Liu <86434077+liulanze@users.noreply.github.com> Date: Fri May 15 12:34:59 2026 -0700 [FlashAttn] Fix supports_kv_cache_dtype() accepting unhandled fp8 kv-cache dtype variants (#42685) Signed-off-by: Lanze Liu <lanzetech@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> commit 4d67d3bde25f94b6199ce16c7ef239ae4412bb8f Author: frida-andersson <fanders…
1 parent b449c2b commit 46fe1bd

3,393 files changed

Lines changed: 476972 additions & 140713 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.buildkite/ci_config.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@ run_all_patterns:
88
- "CMakeLists.txt"
99
- "requirements/common.txt"
1010
- "requirements/cuda.txt"
11-
- "requirements/build.txt"
12-
- "requirements/test.txt"
11+
- "requirements/kv_connectors.txt"
12+
- "requirements/build/cuda.txt"
13+
- "requirements/test/cuda.txt"
1314
- "setup.py"
1415
- "csrc/"
1516
- "cmake/"

.buildkite/ci_config_intel.yaml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
name: vllm_intel_ci
2+
job_dirs:
3+
- ".buildkite/intel_jobs"
4+
run_all_patterns:
5+
- "docker/Dockerfile"
6+
- "CMakeLists.txt"
7+
- "requirements/common.txt"
8+
- "requirements/xpu.txt"
9+
- "requirements/build/cuda.txt"
10+
- "requirements/test/cuda.txt"
11+
- "setup.py"
12+
- "csrc/"
13+
- "cmake/"
14+
run_all_exclude_patterns:
15+
- "docker/Dockerfile."
16+
- "csrc/cpu/"
17+
- "csrc/rocm/"
18+
- "cmake/hipify.py"
19+
- "cmake/cpu_extension.cmake"
20+
registries: public.ecr.aws/q9t5s3a7
21+
repositories:
22+
main: "vllm-ci-test-repo"
23+
premerge: "vllm-ci-test-repo"

.buildkite/hardware_tests/amd.yaml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ steps:
1010
docker build
1111
--build-arg max_jobs=16
1212
--build-arg REMOTE_VLLM=1
13-
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx942;gfx950'
13+
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942;gfx950'
1414
--build-arg VLLM_BRANCH=$BUILDKITE_COMMIT
1515
--tag "rocm/vllm-ci:${BUILDKITE_COMMIT}"
1616
-f docker/Dockerfile.rocm
@@ -20,11 +20,3 @@ steps:
2020
- docker push "rocm/vllm-ci:${BUILDKITE_COMMIT}"
2121
env:
2222
DOCKER_BUILDKIT: "1"
23-
retry:
24-
automatic:
25-
- exit_status: -1 # Agent was lost
26-
limit: 1
27-
- exit_status: -10 # Agent was lost
28-
limit: 1
29-
- exit_status: 1 # Machine occasionally fail
30-
limit: 1

.buildkite/hardware_tests/cpu.yaml

Lines changed: 55 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ depends_on: []
33
steps:
44
- label: CPU-Kernel Tests
55
depends_on: []
6-
soft_fail: true
76
device: intel_cpu
87
no_plugin: true
98
source_file_dependencies:
@@ -13,17 +12,35 @@ steps:
1312
- vllm/_custom_ops.py
1413
- tests/kernels/attention/test_cpu_attn.py
1514
- tests/kernels/moe/test_cpu_fused_moe.py
15+
- tests/kernels/moe/test_cpu_quant_fused_moe.py
1616
- tests/kernels/test_onednn.py
17+
- tests/kernels/test_awq_int4_to_int8.py
18+
- tests/kernels/quantization/test_cpu_fp8_scaled_mm.py
1719
commands:
1820
- |
19-
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
21+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 30m "
2022
pytest -x -v -s tests/kernels/attention/test_cpu_attn.py
2123
pytest -x -v -s tests/kernels/moe/test_cpu_fused_moe.py
22-
pytest -x -v -s tests/kernels/test_onednn.py"
24+
pytest -x -v -s tests/kernels/moe/test_cpu_quant_fused_moe.py
25+
pytest -x -v -s tests/kernels/test_onednn.py
26+
pytest -x -v -s tests/kernels/test_awq_int4_to_int8.py
27+
pytest -x -v -s tests/kernels/quantization/test_cpu_fp8_scaled_mm.py"
28+
29+
- label: CPU-Compatibility Tests
30+
depends_on: []
31+
device: intel_cpu
32+
no_plugin: true
33+
source_file_dependencies:
34+
- cmake/cpu_extension.cmake
35+
- setup.py
36+
- vllm/platforms/cpu.py
37+
commands:
38+
- |
39+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
40+
bash .buildkite/scripts/hardware_ci/run-cpu-compatibility-test.sh"
2341
2442
- label: CPU-Language Generation and Pooling Model Tests
2543
depends_on: []
26-
soft_fail: true
2744
device: intel_cpu
2845
no_plugin: true
2946
source_file_dependencies:
@@ -33,36 +50,49 @@ steps:
3350
- tests/models/language/pooling/
3451
commands:
3552
- |
36-
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 30m "
53+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 40m "
3754
pytest -x -v -s tests/models/language/generation -m cpu_model
3855
pytest -x -v -s tests/models/language/pooling -m cpu_model"
3956
40-
- label: CPU-Quantization Model Tests
57+
- label: CPU-ModelRunnerV2 Tests
4158
depends_on: []
59+
device: intel_cpu
60+
no_plugin: true
4261
soft_fail: true
62+
source_file_dependencies:
63+
- vllm/v1/worker/cpu/
64+
- vllm/v1/worker/gpu/
65+
commands:
66+
- |
67+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 30m "
68+
uv pip install git+https://github.com/triton-lang/triton-cpu.git@270e696d
69+
VLLM_USE_V2_MODEL_RUNNER=1 pytest -x -v -s tests/models/language/generation/test_granite.py -m cpu_model"
70+
71+
- label: CPU-Quantization Model Tests
72+
depends_on: []
4373
device: intel_cpu
4474
no_plugin: true
4575
source_file_dependencies:
4676
- csrc/cpu/
4777
- vllm/model_executor/layers/quantization/cpu_wna16.py
48-
- vllm/model_executor/layers/quantization/gptq_marlin.py
78+
- vllm/model_executor/layers/quantization/auto_gptq.py
4979
- vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py
5080
- vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py
5181
- vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu.py
82+
- vllm/model_executor/layers/fused_moe/experts/cpu_moe.py
5283
- tests/quantization/test_compressed_tensors.py
5384
- tests/quantization/test_cpu_wna16.py
5485
commands:
5586
- |
56-
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
87+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 30m "
5788
pytest -x -v -s tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_logprobs
5889
pytest -x -v -s tests/quantization/test_cpu_wna16.py"
5990
60-
- label: CPU-Distributed Tests
91+
- label: CPU-Distributed Tests (PP+TP)
6192
depends_on: []
62-
soft_fail: true
6393
device: intel_cpu
6494
no_plugin: true
65-
source_file_dependencies:
95+
source_file_dependencies: &cpu_distributed_deps
6696
- csrc/cpu/shm.cpp
6797
- vllm/v1/worker/cpu_worker.py
6898
- vllm/v1/worker/gpu_worker.py
@@ -71,14 +101,24 @@ steps:
71101
- vllm/platforms/cpu.py
72102
- vllm/distributed/parallel_state.py
73103
- vllm/distributed/device_communicators/cpu_communicator.py
104+
- .buildkite/scripts/hardware_ci/run-cpu-distributed-smoke-test.sh
74105
commands:
75106
- |
76107
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 10m "
77-
bash .buildkite/scripts/hardware_ci/run-cpu-distributed-smoke-test.sh"
108+
bash .buildkite/scripts/hardware_ci/run-cpu-distributed-smoke-test.sh tp_pp"
109+
110+
- label: CPU-Distributed Tests (DP+TP)
111+
depends_on: []
112+
device: intel_cpu
113+
no_plugin: true
114+
source_file_dependencies: *cpu_distributed_deps
115+
commands:
116+
- |
117+
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 10m "
118+
bash .buildkite/scripts/hardware_ci/run-cpu-distributed-smoke-test.sh dp_tp"
78119
79120
- label: CPU-Multi-Modal Model Tests %N
80121
depends_on: []
81-
soft_fail: true
82122
device: intel_cpu
83123
no_plugin: true
84124
source_file_dependencies:
@@ -89,11 +129,11 @@ steps:
89129
- |
90130
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 45m "
91131
pytest -x -v -s tests/models/multimodal/generation --ignore=tests/models/multimodal/generation/test_pixtral.py -m cpu_model --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --shard-id=$$BUILDKITE_PARALLEL_JOB"
92-
parallelism: 2
132+
parallelism: 3
93133

94134
- label: "Arm CPU Test"
95135
depends_on: []
96-
soft_fail: true
136+
soft_fail: false
97137
device: arm_cpu
98138
no_plugin: true
99139
commands:

.buildkite/hardware_tests/intel.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,3 @@ steps:
88
commands:
99
- bash .buildkite/scripts/hardware_ci/run-hpu-test.sh
1010

11-
- label: "Intel GPU Test"
12-
depends_on: []
13-
soft_fail: true
14-
device: intel_gpu
15-
no_plugin: true
16-
commands:
17-
- bash .buildkite/scripts/hardware_ci/run-xpu-test.sh

.buildkite/image_build/image_build.sh

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,8 @@ check_and_skip_if_image_exists() {
9292
}
9393

9494
ecr_login() {
95-
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"
96-
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 936637512419.dkr.ecr.us-east-1.amazonaws.com
95+
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY" || true
96+
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 936637512419.dkr.ecr.us-east-1.amazonaws.com || true
9797
}
9898

9999
prepare_cache_tags() {
@@ -192,6 +192,7 @@ export BUILDKITE_COMMIT
192192
export PARENT_COMMIT
193193
export IMAGE_TAG
194194
export IMAGE_TAG_LATEST
195+
export COMMIT="${COMMIT:-${BUILDKITE_COMMIT}}"
195196
export CACHE_FROM
196197
export CACHE_FROM_BASE_BRANCH
197198
export CACHE_FROM_MAIN

.buildkite/image_build/image_build_cpu.sh

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ REPO=$2
1111
BUILDKITE_COMMIT=$3
1212

1313
# authenticate with AWS ECR
14-
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"
14+
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY" || true
1515

1616
# skip build if image already exists
1717
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu) ]]; then
@@ -25,9 +25,7 @@ fi
2525
docker build --file docker/Dockerfile.cpu \
2626
--build-arg max_jobs=16 \
2727
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
28-
--build-arg VLLM_CPU_AVX512BF16=true \
29-
--build-arg VLLM_CPU_AVX512VNNI=true \
30-
--build-arg VLLM_CPU_AMXBF16=true \
28+
--build-arg VLLM_CPU_X86=true \
3129
--tag "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu \
3230
--target vllm-test \
3331
--progress plain .

.buildkite/image_build/image_build_cpu_arm64.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ REPO=$2
1111
BUILDKITE_COMMIT=$3
1212

1313
# authenticate with AWS ECR
14-
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"
14+
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY" || true
1515

1616
# skip build if image already exists
1717
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-arm64-cpu) ]]; then

.buildkite/image_build/image_build_hpu.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ REPO=$2
1111
BUILDKITE_COMMIT=$3
1212

1313
# authenticate with AWS ECR
14-
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"
14+
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY" || true
1515

1616
# skip build if image already exists
1717
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-hpu) ]]; then
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
# Build a vLLM test image with PyTorch nightly installed.
5+
# Called by the pipeline generator's "vLLM Against PyTorch Nightly" group.
6+
7+
if [[ $# -lt 5 ]]; then
8+
echo "Usage: $0 <registry> <repo> <commit> <branch> <image_tag>"
9+
exit 1
10+
fi
11+
12+
REGISTRY=$1
13+
REPO=$2
14+
BUILDKITE_COMMIT=$3
15+
BRANCH=$4
16+
IMAGE_TAG=$5
17+
18+
# --- Arguments ---
19+
echo "--- :mag: Arguments"
20+
echo "REGISTRY: ${REGISTRY}"
21+
echo "REPO: ${REPO}"
22+
echo "BUILDKITE_COMMIT: ${BUILDKITE_COMMIT}"
23+
echo "BRANCH: ${BRANCH}"
24+
echo "IMAGE_TAG: ${IMAGE_TAG}"
25+
26+
# --- ECR login ---
27+
echo "--- :key: ECR login"
28+
aws ecr-public get-login-password --region us-east-1 \
29+
| docker login --username AWS --password-stdin "$REGISTRY"
30+
aws ecr get-login-password --region us-east-1 \
31+
| docker login --username AWS --password-stdin 936637512419.dkr.ecr.us-east-1.amazonaws.com
32+
33+
# --- Set up buildx ---
34+
echo "--- :docker: Setting up buildx"
35+
docker buildx create --name vllm-builder --driver docker-container --use || true
36+
docker buildx inspect --bootstrap
37+
docker buildx ls
38+
39+
# --- Skip if image already exists ---
40+
echo "--- :mag: Checking if image already exists"
41+
if docker manifest inspect "$IMAGE_TAG" >/dev/null 2>&1; then
42+
echo "Image found: $IMAGE_TAG — skipping build"
43+
exit 0
44+
fi
45+
echo "Image not found, proceeding with build..."
46+
47+
# --- CUDA 13.0 for nightly builds ---
48+
# Nightly CI uses CUDA 13.0 while regular CI stays on CUDA 12.9
49+
NIGHTLY_CUDA_VERSION="13.0.2"
50+
NIGHTLY_BUILD_BASE_IMAGE="nvidia/cuda:${NIGHTLY_CUDA_VERSION}-devel-ubuntu22.04"
51+
NIGHTLY_FINAL_BASE_IMAGE="nvidia/cuda:${NIGHTLY_CUDA_VERSION}-base-ubuntu22.04"
52+
53+
echo "--- :docker: Building torch nightly image (CUDA ${NIGHTLY_CUDA_VERSION})"
54+
docker buildx build --file docker/Dockerfile \
55+
--build-arg max_jobs=16 \
56+
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
57+
--build-arg USE_SCCACHE=1 \
58+
--build-arg PYTORCH_NIGHTLY=1 \
59+
--build-arg CUDA_VERSION="${NIGHTLY_CUDA_VERSION}" \
60+
--build-arg BUILD_BASE_IMAGE="${NIGHTLY_BUILD_BASE_IMAGE}" \
61+
--build-arg FINAL_BASE_IMAGE="${NIGHTLY_FINAL_BASE_IMAGE}" \
62+
--build-arg torch_cuda_arch_list="8.0 8.9 9.0 10.0 12.0" \
63+
--tag "$IMAGE_TAG" \
64+
--push \
65+
--target test \
66+
--progress plain .
67+
68+
echo "--- :white_check_mark: Torch nightly image build complete: $IMAGE_TAG"

0 commit comments

Comments
 (0)