-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[Bugfix] catch AssertionError in MistralTokenizer as ValueError #16344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] catch AssertionError in MistralTokenizer as ValueError #16344
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Perhaps this is better solved by updating |
IMO assertion errors should not be treated like user errors |
I agree with you @DarkLight1337 . However I have still to see an external PR on |
To avoid mistakenly catching assertion errors inside vLLM, can we add a try-catch block to wrap mistral tokenizer and convert assertion errors to value errors? |
Yes this is a good idea. I'll update the PR to go in that direction |
Signed-off-by: Guillaume Calmettes <[email protected]>
3038633
to
f18e9f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work for now. Please fix pre-commit
@DarkLight1337 I made the change you proposed and added a note to explain why we are doing an Error type conversion |
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
…-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]>
…-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]>
* [Docs] Add Ollama meetup slides (vllm-project#15905) Signed-off-by: simon-mo <[email protected]> * [Docs] Add Intel as Sponsor (vllm-project#15913) Signed-off-by: simon-mo <[email protected]> * [Spec Decode] Fix input triton kernel for eagle (vllm-project#15909) * [V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (vllm-project#15907) Signed-off-by: Brayden Zhong <[email protected]> * [Bugfix] Fix imports for MoE on CPU (vllm-project#15841) Signed-off-by: Thien Tran <[email protected]> * [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (vllm-project#15902) Signed-off-by: Woosuk Kwon <[email protected]> * [Doc] Update rocm.inc.md (vllm-project#15917) Signed-off-by: chun37 <[email protected]> * [V1][Bugfix] Fix typo in MoE TPU checking (vllm-project#15927) Signed-off-by: Roger Wang <[email protected]> * [Benchmark]Fix error message (vllm-project#15866) Signed-off-by: wangli <[email protected]> Co-authored-by: Roger Wang <[email protected]> * [Misc] Replace print with logger (vllm-project#15923) Signed-off-by: chaunceyjiang <[email protected]> * [CI/Build] Further clean up LoRA tests (vllm-project#15920) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Fix cache block size calculation for CPU MLA (vllm-project#15848) Signed-off-by: Thien Tran <[email protected]> * [Build/CI] Update lm-eval to 0.4.8 (vllm-project#15912) Signed-off-by: Chris Thi <[email protected]> * [Kernel] Add more dtype support for GGUF dequantization (vllm-project#15879) Signed-off-by: lukas.bluebaum <[email protected]> * [core] Add tags parameter to wake_up() (vllm-project#15500) Signed-off-by: Eric <[email protected]> * [V1] Fix json_object support with xgrammar (vllm-project#15488) Signed-off-by: Russell Bryant <[email protected]> * Add minimum version for `huggingface_hub` to enable Xet downloads (vllm-project#15873) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (vllm-project#15926) Signed-off-by: Brayden Zhong <[email protected]> * [CI] Remove duplicate entrypoints-test (vllm-project#15940) Signed-off-by: Kay Yan <[email protected]> * [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (vllm-project#15938) Signed-off-by: chaunceyjiang <[email protected]> * [Metrics] Hide deprecated metrics (vllm-project#15458) Signed-off-by: Mark McLoughlin <[email protected]> * [Frontend] Implement Tool Calling with `tool_choice='required'` (vllm-project#13483) Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Matt, Matthias <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: mgoin <[email protected]> * [CPU][Bugfix] Using custom allreduce for CPU backend (vllm-project#15934) Signed-off-by: jiang1.li <[email protected]> * [Model] use AutoWeightsLoader in model load_weights (vllm-project#15770) Signed-off-by: rongfu.leng <[email protected]> * [Misc] V1 LoRA support CPU offload (vllm-project#15843) Signed-off-by: Jee Jee Li <[email protected]> * Restricted cmake to be less than version 4 as 4.x breaks the build of… (vllm-project#15859) Signed-off-by: Nishidha Panpaliya <[email protected]> * [misc] instruct pytorch to use nvml-based cuda check (vllm-project#15951) Signed-off-by: youkaichao <[email protected]> * [V1] Support Mistral3 in V1 (vllm-project#15950) Signed-off-by: mgoin <[email protected]> * Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (vllm-project#15969) Signed-off-by: Harry Mellor <[email protected]> * [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (vllm-project#15736) Signed-off-by: Hyesoo Yang <[email protected]> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal> * [TPU] optimize the all-reduce performance (vllm-project#15903) Signed-off-by: Chengji Yao <[email protected]> * [V1][TPU] Do not compile sampling more than needed (vllm-project#15883) Signed-off-by: NickLucche <[email protected]> * [ROCM][KERNEL] Paged attention for V1 (vllm-project#15720) Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: root <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: root <[email protected]> * fix: better error message for get_config close vllm-project#13889 (vllm-project#15943) Signed-off-by: yihong0618 <[email protected]> * [bugfix] add seed in torchrun_example.py (vllm-project#15980) Signed-off-by: youkaichao <[email protected]> * [ROCM][V0] PA kennel selection when no sliding window provided (vllm-project#15982) Signed-off-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> * [Benchmark] Add AIMO Dataset to Benchmark (vllm-project#15955) Signed-off-by: Ziji Shi <[email protected]> Signed-off-by: StevenShi-23 <[email protected]> * [misc] improve error message for "Failed to infer device type" (vllm-project#15994) Signed-off-by: youkaichao <[email protected]> * [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (vllm-project#15367) Signed-off-by: wwl2755 <[email protected]> * [doc] update contribution link (vllm-project#15922) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * fix: tiny fix make format.sh excutable (vllm-project#16015) Signed-off-by: yihong0618 <[email protected]> * [SupportsQuant] Bert, Blip, Blip2, Bloom (vllm-project#15573) Signed-off-by: Kyle Sayers <[email protected]> * [SupportsQuant] Chameleon, Chatglm, Commandr (vllm-project#15952) Signed-off-by: Kyle Sayers <[email protected]> * [Neuron][kernel] Fuse kv cache into a single tensor (vllm-project#15911) Signed-off-by: Liangfu Chen <[email protected]> * [Minor] Fused experts refactor (vllm-project#15914) Signed-off-by: Bill Nell <[email protected]> * [Misc][Performance] Advance tpu.txt to the most recent nightly torch … (vllm-project#16024) * Re-enable the AMD Testing for the passing tests. (vllm-project#15586) Signed-off-by: Alexei V. Ivanov <[email protected]> * [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (vllm-project#15732) Signed-off-by: Xiongfei Wei <[email protected]> * [TPU] Switch Test to Non-Sliding Window (vllm-project#15981) Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> * [Bugfix] Fix function names in test_block_fp8.py (vllm-project#16033) Signed-off-by: Bill Nell <[email protected]> * [ROCm] Tweak the benchmark script to run on ROCm (vllm-project#14252) * [Misc] improve gguf check (vllm-project#15974) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU][V1] Remove ragged attention kernel parameter hard coding (vllm-project#16041) Signed-off-by: Chengji Yao <[email protected]> * doc: add info for macos clang errors (vllm-project#16049) Signed-off-by: yihong0618 <[email protected]> * [V1][Spec Decode] Avoid logging useless nan metrics (vllm-project#16023) Signed-off-by: Mark McLoughlin <[email protected]> * [Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (vllm-project#15939) Signed-off-by: Jonghyun Choe <[email protected]> * [Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (vllm-project#15945) Signed-off-by: zhenwei <[email protected]> * [Bugfix][kernels] Fix half2float conversion in gguf kernels (vllm-project#15995) Signed-off-by: Isotr0py <[email protected]> * [Benchmark][Doc] Update throughput benchmark and README (vllm-project#15998) Signed-off-by: StevenShi-23 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> * [CPU] Change default block_size for CPU backend (vllm-project#16002) Signed-off-by: jiang1.li <[email protected]> * [Distributed] [ROCM] Fix custom allreduce enable checks (vllm-project#16010) Signed-off-by: ilmarkov <[email protected]> Co-authored-by: ilmarkov <[email protected]> * [ROCm][Bugfix] Use platform specific FP8 dtype (vllm-project#15717) Signed-off-by: Gregory Shtrasberg <[email protected]> * [ROCm][Bugfix] Bring back fallback to eager mode removed in vllm-project#14917, but for ROCm only (vllm-project#15413) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Bugfix] Fix default behavior/fallback for pp in v1 (vllm-project#16057) Signed-off-by: mgoin <[email protected]> * [CI] Reorganize .buildkite directory (vllm-project#16001) Signed-off-by: kevin <[email protected]> * [V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (vllm-project#15906) Signed-off-by: Nick Hill <[email protected]> * [V1] Scatter and gather placeholders in the model runner (vllm-project#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> * Revert "[V1] Scatter and gather placeholders in the model runner" (vllm-project#16075) * [Kernel][Minor] Re-fuse triton moe weight application (vllm-project#16071) Signed-off-by: Bill Nell <[email protected]> * [Bugfix][TPU] Fix V1 TPU worker for sliding window (vllm-project#16059) Signed-off-by: Michael Goin <[email protected]> * [V1][Spec Decode] Update N-gram Proposer Interface (vllm-project#15750) Signed-off-by: Woosuk Kwon <[email protected]> * [Misc] Auto detect bitsandbytes pre-quantized models (vllm-project#16027) Signed-off-by: Tristan Leclercq <[email protected]> * [CI] Fix benchmark script level (vllm-project#16089) * fix: support clang17 for macos and fix the real libomp (vllm-project#16086) Signed-off-by: yihong0618 <[email protected]> * [doc] fix 404 (vllm-project#16082) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Revert "doc: add info for macos clang errors (vllm-project#16049)" (vllm-project#16091) Signed-off-by: yihong0618 <[email protected]> * Fix some capitalisations in generated examples doc titles (vllm-project#16094) Signed-off-by: Harry Mellor <[email protected]> * [Misc] format output for encoder_decoder.py (vllm-project#16095) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Remove redundant code (vllm-project#16098) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (vllm-project#15946) Signed-off-by: Jinzhen Lin <[email protected]> * [Model] use AutoWeightsLoader for phi, gemma, deepseek (vllm-project#16088) Signed-off-by: Jonghyun Choe <[email protected]> * [Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (vllm-project#16112) Signed-off-by: Lu Fang <[email protected]> * [Benchmark] Add sampling parameters to benchmark_serving. (vllm-project#16022) Signed-off-by: Hyesoo Yang <[email protected]> * [Frontend] Fix typo in tool chat templates for llama3.2 and toolace (vllm-project#14501) Signed-off-by: Ben Jackson <[email protected]> * [CI][V1] Fix passing `tokenizer` as kwarg to `validate_guidance_grammar` (vllm-project#16117) Signed-off-by: Roger Wang <[email protected]> * [Misc] refactor example eagle (vllm-project#16100) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Doc][Bugfix] Add missing EOF in k8s deploy doc (vllm-project#16025) * [Misc] Improve model redirect to accept json dictionary (vllm-project#16119) Signed-off-by: Isotr0py <[email protected]> * [Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (vllm-project#16103) Signed-off-by: rongfu.leng <[email protected]> * [Bugfix] LoRA : Fix the order in which the kernels process LoRAs (vllm-project#16040) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> * [Bugfix] add hf_token to EngineArgs (vllm-project#16093) Signed-off-by: paolovic <[email protected]> Co-authored-by: paolovic <[email protected]> * [Misc] update requires-python in pyproject.toml (vllm-project#16116) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU] Update PyTorch/XLA (vllm-project#16130) Signed-off-by: Chengji Yao <[email protected]> * [V1][Minor] Optimize get_cached_block (vllm-project#16135) * Fix requires-python (vllm-project#16132) * [Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (vllm-project#15202) Signed-off-by: Kay Yan <[email protected]> * [V1][Minor] Minor simplification for get_computed_blocks (vllm-project#16139) Signed-off-by: Woosuk Kwon <[email protected]> * [Misc] Update Mistral-3.1 example (vllm-project#16147) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (vllm-project#16129) Signed-off-by: Isotr0py <[email protected]> * [CI] Set max transformers version for Ultravox model test (vllm-project#16149) Signed-off-by: Roger Wang <[email protected]> * doc: fix some typos in doc (vllm-project#16154) Signed-off-by: yihong0618 <[email protected]> * [VLM] Florence-2 supports online serving (vllm-project#16164) Signed-off-by: Isotr0py <[email protected]> * [V1][Structured Output] Add `supports_structured_output()` method to Platform (vllm-project#16148) Signed-off-by: shen-shanshan <[email protected]> * [Model] Add Qwen3 and Qwen3MoE (vllm-project#15289) Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Misc] improve example mlpspeculator and llm_engine_example (vllm-project#16175) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Doc]Update image to latest version (vllm-project#16186) Signed-off-by: WangErXiao <[email protected]> * Upstream Llama4 Support to Main (vllm-project#16113) Signed-off-by: Aston Zhang <[email protected]> Signed-off-by: Chris Thi <[email protected]> Signed-off-by: drisspg <[email protected]> Signed-off-by: Jon Swenson <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Xiaodong Wang <[email protected]> Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> * [Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (vllm-project#16187) Signed-off-by: DarkLight1337 <[email protected]> * [V1] Revert the default `max_num_seqs` to V0 values for most hardware (vllm-project#16158) Signed-off-by: DarkLight1337 <[email protected]> * Print the warning only once (vllm-project#16193) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Misc] Human-readable `max-model-len` cli arg (vllm-project#16181) Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Misc] Move Llama 4 projector call into encoder execution (vllm-project#16201) * [Bugfix] Fix guidance backend for Qwen models (vllm-project#16210) Signed-off-by: Benjamin Chislett <[email protected]> * [V1][BugFix] Exit properly if engine core fails during startup (vllm-project#16137) Signed-off-by: Nick Hill <[email protected]> * [Misc] add description attribute in CLI (vllm-project#15921) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix][V0] XGrammar structured output supports Enum (vllm-project#15878) Signed-off-by: Leon Seidel <[email protected]> * Torchao (vllm-project#14231) Signed-off-by: drisspg <[email protected]> * [ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (vllm-project#16031) Signed-off-by: mgoin <[email protected]> * [core] do not send error across process (vllm-project#16174) Signed-off-by: youkaichao <[email protected]> * [Misc] Update compressed-tensors to version 0.9.3 (vllm-project#16196) Signed-off-by: Miles Williams <[email protected]> * Update BASE_IMAGE to 2.22 release of Neuron (vllm-project#16218) * [V1] Scatter and gather placeholders in the model runner (vllm-project#16076) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> * [Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (vllm-project#16161) * Add warning for Attention backends that do not support irope yet (vllm-project#16212) * [Bugfix] Do not skip "empty" parts of chats that are parsable (vllm-project#16219) Signed-off-by: mgoin <[email protected]> * [Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (vllm-project#16194) Signed-off-by: Isotr0py <[email protected]> * [torch.compile][TPU] Make @support_torch_compile work for XLA backend (vllm-project#15782) Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: mgoin <[email protected]> Co-authored-by: mgoin <[email protected]> * [V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (vllm-project#15837) Signed-off-by: mgoin <[email protected]> * [Misc] Merge the logs of pp layers partitions (vllm-project#16225) Signed-off-by: Kebe <[email protected]> * [Docs] Add Slides from Singapore Meetup (vllm-project#16213) Signed-off-by: simon-mo <[email protected]> * [Misc] format and refactor some examples (vllm-project#16252) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Add warning for multimodal data in LLM.beam_search (vllm-project#16241) Signed-off-by: Alex-Brooks <[email protected]> * [Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (vllm-project#16203) Signed-off-by: rongfu.leng <[email protected]> * [BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (vllm-project#16247) Signed-off-by: Tianyuan Wu <[email protected]> * [Bugfix] Remove triton do_bench fast_flush arg (vllm-project#16256) Signed-off-by: Kebe <[email protected]> * Update to transformers==4.51.1 (vllm-project#16257) Signed-off-by: Harry Mellor <[email protected]> * [New Model]: jinaai/jina-embeddings-v3 (vllm-project#16120) * [Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (vllm-project#16272) Signed-off-by: imkero <[email protected]> * [Bugfix] Proper input validation for multi-modal encoder-decoder models (vllm-project#16156) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (vllm-project#15328) Signed-off-by: Isotr0py <[email protected]> * Add warning that content below line in template will be removed (vllm-project#16276) Signed-off-by: Harry Mellor <[email protected]> * [BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (vllm-project#16209) Signed-off-by: Lucas Wilkinson <[email protected]> * [Bugfix] fix deepseek fp16 scale bug (vllm-project#14809) Signed-off-by: Jinzhen Lin <[email protected]> Co-authored-by: mgoin <[email protected]> * [V1] Update structured output offline inference example (vllm-project#15721) Signed-off-by: Russell Bryant <[email protected]> * [CI/Build] Fix CI LoRA failure (vllm-project#16270) Signed-off-by: Jee Jee Li <[email protected]> * Add support to modelopt quantization of Mixtral model (vllm-project#15961) Signed-off-by: Yue <[email protected]> * [Model] Add smolvlm support (vllm-project#16017) Signed-off-by: chaunceyjiang <[email protected]> * [Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (vllm-project#16198) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: kliuae <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: kliuae <[email protected]> * [Bugfix] fix gettid method is not define (vllm-project#16084) Signed-off-by: rongfu.leng <[email protected]> * [Feature] Estimate max-model-len use available KV cache memory (vllm-project#16168) Signed-off-by: rongfu.leng <[email protected]> * [Core] Upgrade to xgrammar 0.1.18, add cache size limit (vllm-project#16283) Signed-off-by: Russell Bryant <[email protected]> * [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (vllm-project#16221) Signed-off-by: mgoin <[email protected]> * [TPU] Update PyTorch/XLA (vllm-project#16288) Signed-off-by: Chengji Yao <[email protected]> * [BugFix] Fix fusion test and add them to CI (vllm-project#16287) Signed-off-by: luka <[email protected]> * [Misc] Fix test_sharded_state_loader.py(vllm-project#16004) (vllm-project#16005) Signed-off-by: lvfei.lv <[email protected]> * [Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (vllm-project#16273) Signed-off-by: DarkLight1337 <[email protected]> * Update label-tpu mergify and remove removal bot (vllm-project#16298) * [BugFix] logger is not callable (vllm-project#16312) Signed-off-by: yihong0618 <[email protected]> * [BugFix] llama4 qknorm should be not shared across head (vllm-project#16311) Signed-off-by: Lu Fang <[email protected]> * update neuron config (vllm-project#16289) Signed-off-by: Ajay Vohra <[email protected]> * [BugFix] fix some typos found by typos. (vllm-project#16314) Signed-off-by: yihong0618 <[email protected]> * [Model] Add `SupportsMultiModal.get_language_model` interface (vllm-project#16007) Signed-off-by: NickLucche <[email protected]> * [Bugfix][Frontend] respect provided default guided decoding backend (vllm-project#15476) Signed-off-by: Guillaume Calmettes <[email protected]> * Revert "Update label-tpu mergify and remove removal bot" (vllm-project#16350) * [Bugfix] Fix profiling.py (vllm-project#16202) Signed-off-by: zh Wang <[email protected]> * [Bugfix] catch AssertionError in MistralTokenizer as ValueError (vllm-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]> * [CI]Fix hpu docker and numpy version for CI (vllm-project#16355) Signed-off-by: Chendi Xue <[email protected]> * Fix `benchmark_throughput.py --backend=hf` (vllm-project#16352) Signed-off-by: mgoin <[email protected]> * [Build/CI] Add tracing deps to vllm container image (vllm-project#15224) Signed-off-by: Russell Bryant <[email protected]> * [Hardware] add platform-specific request validation api (vllm-project#16291) Signed-off-by: Joe Runde <[email protected]> * [Misc] refactor Structured Outputs example (vllm-project#16322) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (vllm-project#16275) Signed-off-by: Chengji Yao <[email protected]> * Add GLM-4-0414 support (vllm-project#16338) Signed-off-by: lvfei.lv <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: yihong0618 <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Ajay Vohra <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Guillaume Calmettes <[email protected]> Co-authored-by: Accelerator1996 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: yihong <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: ajayvohra2005 <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> * [Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]> * [Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (vllm-project#16325) Signed-off-by: Aaron Ang <[email protected]> * [TPU] Fix dummy loading OOM (vllm-project#16372) Signed-off-by: Chengji Yao <[email protected]> * [bugfix] Avoid the time consumption caused by creating dummy videos. (vllm-project#16371) * [CI][Bugfix] Pin triton version for CPU (vllm-project#16384) Signed-off-by: Roger Wang <[email protected]> * [misc] use tqdm.auto where appropriate (vllm-project#16290) Signed-off-by: Benjamin Kitor <[email protected]> * [Bugfix][TPU] Fix TPU validate_request (vllm-project#16369) Signed-off-by: Michael Goin <[email protected]> * fix sonnet dataset sample when prefix len is very small (vllm-project#16379) Signed-off-by: Chenyaaang <[email protected]> * [Model] use AutoWeightsLoader for deepseek_v2, internlm2 (vllm-project#16383) Signed-off-by: Aaron Ang <[email protected]> * [Misc] Update transformers version limits of multi-modal tests (vllm-project#16381) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix validation error for text-only Mllama 3.2 (vllm-project#16377) Signed-off-by: DarkLight1337 <[email protected]> * [Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (vllm-project#16038) Signed-off-by: mgoin <[email protected]> * [doc] add download model tips (vllm-project#16389) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Update Numba to 0.61.2 (vllm-project#16376) Signed-off-by: cyy <[email protected]> * [Model] Remove image mm limit for LLaMa4 (vllm-project#16365) Signed-off-by: Ye (Charlotte) Qi <[email protected]> * [doc] update the wrong link (vllm-project#16401) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [CI] Add auto update workflow for Dockerfile graph (vllm-project#11879) Signed-off-by: wineandchord <[email protected]> * Fix the torch version parsing logic (vllm-project#15857) * [VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (vllm-project#16408) Signed-off-by: DarkLight1337 <[email protected]> * [TPU][V1] Use `language_model` interface for getting text backbone in MM (vllm-project#16410) Signed-off-by: NickLucche <[email protected]> * Improve configs - `ParallelConfig` (vllm-project#16332) Signed-off-by: Harry Mellor <[email protected]> * [V1] Set structured output backend to `auto` by default (vllm-project#15724) Signed-off-by: Russell Bryant <[email protected]> * [V1][Spec Decode] Eagle Model loading (vllm-project#16035) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * [Bugfix] Fix bug when dataset is json (vllm-project#15899) Signed-off-by: Chenyaaang <[email protected]> * [Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (vllm-project#15423) Signed-off-by: Chih-Chieh-Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> * [V1] Zero-copy tensor/ndarray serialization/transmission (vllm-project#13790) Signed-off-by: Nick Hill <[email protected]> * [VLM] Avoid unnecessary dummy multimodal data during processing (vllm-project#16416) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix output token length check logic (vllm-project#16419) Signed-off-by: look <[email protected]> * [TPU][V1] Disable per-request seed/Generator (vllm-project#16172) Signed-off-by: NickLucche <[email protected]> * Fix range_ratio Bug in RandomDataset (vllm-project#16126) Signed-off-by: jadewang21 <[email protected]> * check input length of sonnet samples (vllm-project#16423) Signed-off-by: alexey-belyakov <[email protected]> * update benchmark_serving_structured_output to include auto backend (vllm-project#16438) Signed-off-by: Chenyaaang <[email protected]> * [Llama4] Enable attention temperature tuning by default for long context (>32k) (vllm-project#16439) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> * Update supported_hardware.md for TPU INT8 (vllm-project#16437) * [Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (vllm-project#16424) Signed-off-by: Isotr0py <[email protected]> * [CPU][Bugfix] Fix CPU docker issues (vllm-project#16454) Signed-off-by: jiang.li <[email protected]> * [Bugfix] Don't set an upper bound on repetition penalty (vllm-project#16403) Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Nick Hill <[email protected]> * Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (vllm-project#16453) * [Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (vllm-project#15990) Signed-off-by: Jee Jee Li <[email protected]> * Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (vllm-project#16447) Signed-off-by: mgoin <[email protected]> * [Misc] Raise error for V1 not supporting Long LoRA. (vllm-project#16415) Signed-off-by: Jee Jee Li <[email protected]> * [Misc] update api_client example (vllm-project#16459) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Don't install triton on `ppc64le` platform (vllm-project#16470) Signed-off-by: Harry Mellor <[email protected]> * [Kernel] support merge_attn_states CUDA kernel, 3x speedup (vllm-project#16173) Signed-off-by: DefTruth <[email protected]> * [Bugfix] Fix bugs of running Quark quantized models (vllm-project#16236) Signed-off-by: chaow <[email protected]> * [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (vllm-project#12779) Signed-off-by: Tomasz Zielinski <[email protected]> * Fix erroneous "model doesn't support compile" warning (vllm-project#16486) Signed-off-by: rzou <[email protected]> * [TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (vllm-project#16483) Signed-off-by: NickLucche <[email protected]> * [Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (vllm-project#16366) Signed-off-by: mgoin <[email protected]> * [Doc] Document InternVL3 support (vllm-project#16495) Signed-off-by: Isotr0py <[email protected]> * [Bugfix] handle alignment of encoder_seq_lens in mllama.py (vllm-project#14784) Signed-off-by: Travis Johnson <[email protected]> * Improve configs - `LoadConfig` (vllm-project#16422) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Added chat templates for LLaMa4 pythonic tool calling (vllm-project#16463) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Kai Wu <[email protected]> * [Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (vllm-project#16488) * Update openai_compatible_server.md (vllm-project#16507) Signed-off-by: Christian Sears <[email protected]> * [Bugfix] clean up duplicated code (vllm-project#16485) Signed-off-by: Gogs <[email protected]> Co-authored-by: Gogs <[email protected]> * Bugfix for PixtralHF models without spatial_merge_size (vllm-project#16513) Signed-off-by: mgoin <[email protected]> * [Doc] Fix link to vLLM blog (vllm-project#16519) Signed-off-by: Yuan Tang <[email protected]> * [CI][Bugfix] Add mistral_tool_use to Ci (vllm-project#16517) Signed-off-by: mgoin <[email protected]> * [BugFix] Handle non-contiguous tensors properly when serializing (vllm-project#16492) Signed-off-by: Nick Hill <[email protected]> * [Doc] Update Llama4 Model Names in Supported Models (vllm-project#16509) Signed-off-by: Ye (Charlotte) Qi <[email protected]> * Optimized topk for topk=1 (Llama-4) (vllm-project#16512) Signed-off-by: mgoin <[email protected]> * [Feature][V1] Add xgrammar to support minLength, maxLength with test (vllm-project#16516) Signed-off-by: Leon Seidel <[email protected]> * [Frontend] support matryoshka representation / support embedding API dimensions (vllm-project#16331) * fix: spelling (vllm-project#16466) Signed-off-by: Tianer Zhou <[email protected]> * [Misc] Update chat utils tests (vllm-project#16520) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Openai transcription client example use same Whisper model (vllm-project#16487) Signed-off-by: NickLucche <[email protected]> * [V1] Enable multi-input by default (vllm-project#15799) Signed-off-by: DarkLight1337 <[email protected]> * [MISC] Make GroupCoordinator compatible with out-of-tree devices (vllm-project#16464) Signed-off-by: [email protected] <[email protected]> * [Misc] Delete redundant code (vllm-project#16530) Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> * Fix syntaxWarning: invalid escape sequence '\s' (vllm-project#16532) Signed-off-by: Jie Fu <[email protected]> * [Perf] Optimize Preparing Inputs for GPU Model Runner (vllm-project#16484) Signed-off-by: snowcharm <[email protected]> Co-authored-by: Nick Hill <[email protected]> * [Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> * [V1][Spec Decode] KV cache slots for eagle heads (vllm-project#16370) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (vllm-project#16537) Signed-off-by: mgoin <[email protected]> * [Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (vllm-project#16556) * [Core][V0] Enable regex support with xgrammar (vllm-project#13228) Signed-off-by: Russell Bryant <[email protected]> --------- Signed-off-by: simon-mo <[email protected]> Signed-off-by: Brayden Zhong <[email protected]> Signed-off-by: Thien Tran <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: chun37 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: wangli <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Chris Thi <[email protected]> Signed-off-by: lukas.bluebaum <[email protected]> Signed-off-by: Eric <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Kay Yan <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Matt, Matthias <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Nishidha Panpaliya <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Hyesoo Yang <[email protected]> Signed-off-by: Chengji Yao <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: root <[email protected]> Signed-off-by: yihong0618 <[email protected]> Signed-off-by: Ziji Shi <[email protected]> Signed-off-by: StevenShi-23 <[email protected]> Signed-off-by: wwl2755 <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Bill Nell <[email protected]> Signed-off-by: Alexei V. Ivanov <[email protected]> Signed-off-by: Xiongfei Wei <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Jonghyun Choe <[email protected]> Signed-off-by: zhenwei <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: ilmarkov <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: kevin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Signed-off-by: Tristan Leclercq <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Ben Jackson <[email protected]> Signed-off-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: paolovic <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: YamPengLi <[email protected]> Signed-off-by: WangErXiao <[email protected]> Signed-off-by: Aston Zhang <[email protected]> Signed-off-by: drisspg <[email protected]> Signed-off-by: Jon Swenson <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Xiaodong Wang <[email protected]> Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Leon Seidel <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Miles Williams <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Kebe <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Tianyuan Wu <[email protected]> Signed-off-by: imkero <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Yue <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: kliuae <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: lvfei.lv <[email protected]> Signed-off-by: Ajay Vohra <[email protected]> Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: zh Wang <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: Aaron Ang <[email protected]> Signed-off-by: Benjamin Kitor <[email protected]> Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: cyy <[email protected]> Signed-off-by: wineandchord <[email protected]> Signed-off-by: LiuXiaoxuanPKU <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: look <[email protected]> Signed-off-by: jadewang21 <[email protected]> Signed-off-by: alexey-belyakov <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: DefTruth <[email protected]> Signed-off-by: chaow <[email protected]> Signed-off-by: Tomasz Zielinski <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Christian Sears <[email protected]> Signed-off-by: Gogs <[email protected]> Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: Tianer Zhou <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Jie Fu <[email protected]> Signed-off-by: snowcharm <[email protected]> Signed-off-by: Ryan McConville <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Ekagra Ranjan <[email protected]> Co-authored-by: Brayden Zhong <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: chun <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Li Wang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Chris Thi <[email protected]> Co-authored-by: LukasBluebaum <[email protected]> Co-authored-by: Eric Tang <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Kay Yan <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Matthias Matt <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Nishidha <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Hyesoo Yang <[email protected]> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal> Co-authored-by: Chengji Yao <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: yihong <[email protected]> Co-authored-by: Ziji Shi (Steven) <[email protected]> Co-authored-by: wwl2755 <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: yarongmu-google <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: iefgnoix <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Jonghyun Choe <[email protected]> Co-authored-by: liuzhenwei <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Ilya Markov <[email protected]> Co-authored-by: ilmarkov <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Tristan Leclercq <[email protected]> Co-authored-by: Jinzhen Lin <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Ben Jackson <[email protected]> Co-authored-by: Paul Schweigert <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: Martin Hoyer <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Robin <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: leon-seidel <[email protected]> Co-authored-by: Driss Guessous <[email protected]> Co-authored-by: Miles Williams <[email protected]> Co-authored-by: Satyajith Chilappagari <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: zxfan-cpu <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: Kebe <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: TY-AMD <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Kero Liang <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yueshen2016 <[email protected]> Co-authored-by: TJian <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: kliuae <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Accelerator1996 <[email protected]> Co-authored-by: ajayvohra2005 <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: zh Wang <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Yuxuan Zhang <[email protected]> Co-authored-by: Aaron Ang <[email protected]> Co-authored-by: Jintao <[email protected]> Co-authored-by: Benjamin Kitor <[email protected]> Co-authored-by: Chenyaaang <[email protected]> Co-authored-by: cyyever <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: wineandchord <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: look <[email protected]> Co-authored-by: WWW <[email protected]> Co-authored-by: Alexey Belyakov <[email protected]> Co-authored-by: DefTruth <[email protected]> Co-authored-by: chaow-amd <[email protected]> Co-authored-by: Tomasz Zielinski <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Christian Sears <[email protected]> Co-authored-by: Gogs <[email protected]> Co-authored-by: Yuan Tang <[email protected]> Co-authored-by: Tianer Zhou <[email protected]> Co-authored-by: Huazhong Ji <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: SnowCharm <[email protected]> Co-authored-by: Ryan McConville <[email protected]>
* [V1] Fix: make sure `k_index` is int64 for `apply_top_k_only` (vllm-project#15907) Signed-off-by: Brayden Zhong <[email protected]> * [Bugfix] Fix imports for MoE on CPU (vllm-project#15841) Signed-off-by: Thien Tran <[email protected]> * [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (vllm-project#15902) Signed-off-by: Woosuk Kwon <[email protected]> * [Doc] Update rocm.inc.md (vllm-project#15917) Signed-off-by: chun37 <[email protected]> * [V1][Bugfix] Fix typo in MoE TPU checking (vllm-project#15927) Signed-off-by: Roger Wang <[email protected]> * [Benchmark]Fix error message (vllm-project#15866) Signed-off-by: wangli <[email protected]> Co-authored-by: Roger Wang <[email protected]> * [Misc] Replace print with logger (vllm-project#15923) Signed-off-by: chaunceyjiang <[email protected]> * [CI/Build] Further clean up LoRA tests (vllm-project#15920) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Fix cache block size calculation for CPU MLA (vllm-project#15848) Signed-off-by: Thien Tran <[email protected]> * [Build/CI] Update lm-eval to 0.4.8 (vllm-project#15912) Signed-off-by: Chris Thi <[email protected]> * [Kernel] Add more dtype support for GGUF dequantization (vllm-project#15879) Signed-off-by: lukas.bluebaum <[email protected]> * [core] Add tags parameter to wake_up() (vllm-project#15500) Signed-off-by: Eric <[email protected]> * [V1] Fix json_object support with xgrammar (vllm-project#15488) Signed-off-by: Russell Bryant <[email protected]> * Add minimum version for `huggingface_hub` to enable Xet downloads (vllm-project#15873) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix][Benchmarks] Ensure `async_request_deepspeed_mii` uses the OpenAI choices key (vllm-project#15926) Signed-off-by: Brayden Zhong <[email protected]> * [CI] Remove duplicate entrypoints-test (vllm-project#15940) Signed-off-by: Kay Yan <[email protected]> * [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (vllm-project#15938) Signed-off-by: chaunceyjiang <[email protected]> * [Metrics] Hide deprecated metrics (vllm-project#15458) Signed-off-by: Mark McLoughlin <[email protected]> * [Frontend] Implement Tool Calling with `tool_choice='required'` (vllm-project#13483) Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Matt, Matthias <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: mgoin <[email protected]> * [CPU][Bugfix] Using custom allreduce for CPU backend (vllm-project#15934) Signed-off-by: jiang1.li <[email protected]> * [Model] use AutoWeightsLoader in model load_weights (vllm-project#15770) Signed-off-by: rongfu.leng <[email protected]> * [Misc] V1 LoRA support CPU offload (vllm-project#15843) Signed-off-by: Jee Jee Li <[email protected]> * Restricted cmake to be less than version 4 as 4.x breaks the build of… (vllm-project#15859) Signed-off-by: Nishidha Panpaliya <[email protected]> * [misc] instruct pytorch to use nvml-based cuda check (vllm-project#15951) Signed-off-by: youkaichao <[email protected]> * [V1] Support Mistral3 in V1 (vllm-project#15950) Signed-off-by: mgoin <[email protected]> * Fix `huggingface-cli[hf-xet]` -> `huggingface-cli[hf_xet]` (vllm-project#15969) Signed-off-by: Harry Mellor <[email protected]> * [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (vllm-project#15736) Signed-off-by: Hyesoo Yang <[email protected]> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal> * [TPU] optimize the all-reduce performance (vllm-project#15903) Signed-off-by: Chengji Yao <[email protected]> * [V1][TPU] Do not compile sampling more than needed (vllm-project#15883) Signed-off-by: NickLucche <[email protected]> * [ROCM][KERNEL] Paged attention for V1 (vllm-project#15720) Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: root <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: root <[email protected]> * fix: better error message for get_config close vllm-project#13889 (vllm-project#15943) Signed-off-by: yihong0618 <[email protected]> * [bugfix] add seed in torchrun_example.py (vllm-project#15980) Signed-off-by: youkaichao <[email protected]> * [ROCM][V0] PA kennel selection when no sliding window provided (vllm-project#15982) Signed-off-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> * [Benchmark] Add AIMO Dataset to Benchmark (vllm-project#15955) Signed-off-by: Ziji Shi <[email protected]> Signed-off-by: StevenShi-23 <[email protected]> * [misc] improve error message for "Failed to infer device type" (vllm-project#15994) Signed-off-by: youkaichao <[email protected]> * [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (vllm-project#15367) Signed-off-by: wwl2755 <[email protected]> * [doc] update contribution link (vllm-project#15922) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * fix: tiny fix make format.sh excutable (vllm-project#16015) Signed-off-by: yihong0618 <[email protected]> * [SupportsQuant] Bert, Blip, Blip2, Bloom (vllm-project#15573) Signed-off-by: Kyle Sayers <[email protected]> * [SupportsQuant] Chameleon, Chatglm, Commandr (vllm-project#15952) Signed-off-by: Kyle Sayers <[email protected]> * [Neuron][kernel] Fuse kv cache into a single tensor (vllm-project#15911) Signed-off-by: Liangfu Chen <[email protected]> * [Minor] Fused experts refactor (vllm-project#15914) Signed-off-by: Bill Nell <[email protected]> * [Misc][Performance] Advance tpu.txt to the most recent nightly torch … (vllm-project#16024) * Re-enable the AMD Testing for the passing tests. (vllm-project#15586) Signed-off-by: Alexei V. Ivanov <[email protected]> * [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (vllm-project#15732) Signed-off-by: Xiongfei Wei <[email protected]> * [TPU] Switch Test to Non-Sliding Window (vllm-project#15981) Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> * [Bugfix] Fix function names in test_block_fp8.py (vllm-project#16033) Signed-off-by: Bill Nell <[email protected]> * [ROCm] Tweak the benchmark script to run on ROCm (vllm-project#14252) * [Misc] improve gguf check (vllm-project#15974) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU][V1] Remove ragged attention kernel parameter hard coding (vllm-project#16041) Signed-off-by: Chengji Yao <[email protected]> * doc: add info for macos clang errors (vllm-project#16049) Signed-off-by: yihong0618 <[email protected]> * [V1][Spec Decode] Avoid logging useless nan metrics (vllm-project#16023) Signed-off-by: Mark McLoughlin <[email protected]> * [Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (vllm-project#15939) Signed-off-by: Jonghyun Choe <[email protected]> * [Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (vllm-project#15945) Signed-off-by: zhenwei <[email protected]> * [Bugfix][kernels] Fix half2float conversion in gguf kernels (vllm-project#15995) Signed-off-by: Isotr0py <[email protected]> * [Benchmark][Doc] Update throughput benchmark and README (vllm-project#15998) Signed-off-by: StevenShi-23 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> * [CPU] Change default block_size for CPU backend (vllm-project#16002) Signed-off-by: jiang1.li <[email protected]> * [Distributed] [ROCM] Fix custom allreduce enable checks (vllm-project#16010) Signed-off-by: ilmarkov <[email protected]> Co-authored-by: ilmarkov <[email protected]> * [ROCm][Bugfix] Use platform specific FP8 dtype (vllm-project#15717) Signed-off-by: Gregory Shtrasberg <[email protected]> * [ROCm][Bugfix] Bring back fallback to eager mode removed in vllm-project#14917, but for ROCm only (vllm-project#15413) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Bugfix] Fix default behavior/fallback for pp in v1 (vllm-project#16057) Signed-off-by: mgoin <[email protected]> * [CI] Reorganize .buildkite directory (vllm-project#16001) Signed-off-by: kevin <[email protected]> * [V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (vllm-project#15906) Signed-off-by: Nick Hill <[email protected]> * [V1] Scatter and gather placeholders in the model runner (vllm-project#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> * Revert "[V1] Scatter and gather placeholders in the model runner" (vllm-project#16075) * [Kernel][Minor] Re-fuse triton moe weight application (vllm-project#16071) Signed-off-by: Bill Nell <[email protected]> * [Bugfix][TPU] Fix V1 TPU worker for sliding window (vllm-project#16059) Signed-off-by: Michael Goin <[email protected]> * [V1][Spec Decode] Update N-gram Proposer Interface (vllm-project#15750) Signed-off-by: Woosuk Kwon <[email protected]> * [Misc] Auto detect bitsandbytes pre-quantized models (vllm-project#16027) Signed-off-by: Tristan Leclercq <[email protected]> * [CI] Fix benchmark script level (vllm-project#16089) * fix: support clang17 for macos and fix the real libomp (vllm-project#16086) Signed-off-by: yihong0618 <[email protected]> * [doc] fix 404 (vllm-project#16082) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Revert "doc: add info for macos clang errors (vllm-project#16049)" (vllm-project#16091) Signed-off-by: yihong0618 <[email protected]> * Fix some capitalisations in generated examples doc titles (vllm-project#16094) Signed-off-by: Harry Mellor <[email protected]> * [Misc] format output for encoder_decoder.py (vllm-project#16095) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Remove redundant code (vllm-project#16098) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (vllm-project#15946) Signed-off-by: Jinzhen Lin <[email protected]> * [Model] use AutoWeightsLoader for phi, gemma, deepseek (vllm-project#16088) Signed-off-by: Jonghyun Choe <[email protected]> * [Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (vllm-project#16112) Signed-off-by: Lu Fang <[email protected]> * [Benchmark] Add sampling parameters to benchmark_serving. (vllm-project#16022) Signed-off-by: Hyesoo Yang <[email protected]> * [Frontend] Fix typo in tool chat templates for llama3.2 and toolace (vllm-project#14501) Signed-off-by: Ben Jackson <[email protected]> * [CI][V1] Fix passing `tokenizer` as kwarg to `validate_guidance_grammar` (vllm-project#16117) Signed-off-by: Roger Wang <[email protected]> * [Misc] refactor example eagle (vllm-project#16100) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Doc][Bugfix] Add missing EOF in k8s deploy doc (vllm-project#16025) * [Misc] Improve model redirect to accept json dictionary (vllm-project#16119) Signed-off-by: Isotr0py <[email protected]> * [Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (vllm-project#16103) Signed-off-by: rongfu.leng <[email protected]> * [Bugfix] LoRA : Fix the order in which the kernels process LoRAs (vllm-project#16040) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> * [Bugfix] add hf_token to EngineArgs (vllm-project#16093) Signed-off-by: paolovic <[email protected]> Co-authored-by: paolovic <[email protected]> * [Misc] update requires-python in pyproject.toml (vllm-project#16116) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU] Update PyTorch/XLA (vllm-project#16130) Signed-off-by: Chengji Yao <[email protected]> * [V1][Minor] Optimize get_cached_block (vllm-project#16135) * Fix requires-python (vllm-project#16132) * [Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (vllm-project#15202) Signed-off-by: Kay Yan <[email protected]> * [V1][Minor] Minor simplification for get_computed_blocks (vllm-project#16139) Signed-off-by: Woosuk Kwon <[email protected]> * [Misc] Update Mistral-3.1 example (vllm-project#16147) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (vllm-project#16129) Signed-off-by: Isotr0py <[email protected]> * [CI] Set max transformers version for Ultravox model test (vllm-project#16149) Signed-off-by: Roger Wang <[email protected]> * doc: fix some typos in doc (vllm-project#16154) Signed-off-by: yihong0618 <[email protected]> * [VLM] Florence-2 supports online serving (vllm-project#16164) Signed-off-by: Isotr0py <[email protected]> * [V1][Structured Output] Add `supports_structured_output()` method to Platform (vllm-project#16148) Signed-off-by: shen-shanshan <[email protected]> * [Model] Add Qwen3 and Qwen3MoE (vllm-project#15289) Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Misc] improve example mlpspeculator and llm_engine_example (vllm-project#16175) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Doc]Update image to latest version (vllm-project#16186) Signed-off-by: WangErXiao <[email protected]> * Upstream Llama4 Support to Main (vllm-project#16113) Signed-off-by: Aston Zhang <[email protected]> Signed-off-by: Chris Thi <[email protected]> Signed-off-by: drisspg <[email protected]> Signed-off-by: Jon Swenson <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Xiaodong Wang <[email protected]> Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> * [Bugfix] Re-enable support for `ChatGLMForConditionalGeneration` (vllm-project#16187) Signed-off-by: DarkLight1337 <[email protected]> * [V1] Revert the default `max_num_seqs` to V0 values for most hardware (vllm-project#16158) Signed-off-by: DarkLight1337 <[email protected]> * Print the warning only once (vllm-project#16193) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Misc] Human-readable `max-model-len` cli arg (vllm-project#16181) Signed-off-by: NickLucche <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Misc] Move Llama 4 projector call into encoder execution (vllm-project#16201) * [Bugfix] Fix guidance backend for Qwen models (vllm-project#16210) Signed-off-by: Benjamin Chislett <[email protected]> * [V1][BugFix] Exit properly if engine core fails during startup (vllm-project#16137) Signed-off-by: Nick Hill <[email protected]> * [Misc] add description attribute in CLI (vllm-project#15921) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix][V0] XGrammar structured output supports Enum (vllm-project#15878) Signed-off-by: Leon Seidel <[email protected]> * Torchao (vllm-project#14231) Signed-off-by: drisspg <[email protected]> * [ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (vllm-project#16031) Signed-off-by: mgoin <[email protected]> * [core] do not send error across process (vllm-project#16174) Signed-off-by: youkaichao <[email protected]> * [Misc] Update compressed-tensors to version 0.9.3 (vllm-project#16196) Signed-off-by: Miles Williams <[email protected]> * Update BASE_IMAGE to 2.22 release of Neuron (vllm-project#16218) * [V1] Scatter and gather placeholders in the model runner (vllm-project#16076) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> * [Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (vllm-project#16161) * Add warning for Attention backends that do not support irope yet (vllm-project#16212) * [Bugfix] Do not skip "empty" parts of chats that are parsable (vllm-project#16219) Signed-off-by: mgoin <[email protected]> * [Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (vllm-project#16194) Signed-off-by: Isotr0py <[email protected]> * [torch.compile][TPU] Make @support_torch_compile work for XLA backend (vllm-project#15782) Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: mgoin <[email protected]> Co-authored-by: mgoin <[email protected]> * [V1] Add `disable_chunked_mm_input` arg to disable partial mm input prefill (vllm-project#15837) Signed-off-by: mgoin <[email protected]> * [Misc] Merge the logs of pp layers partitions (vllm-project#16225) Signed-off-by: Kebe <[email protected]> * [Docs] Add Slides from Singapore Meetup (vllm-project#16213) Signed-off-by: simon-mo <[email protected]> * [Misc] format and refactor some examples (vllm-project#16252) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Add warning for multimodal data in LLM.beam_search (vllm-project#16241) Signed-off-by: Alex-Brooks <[email protected]> * [Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (vllm-project#16203) Signed-off-by: rongfu.leng <[email protected]> * [BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (vllm-project#16247) Signed-off-by: Tianyuan Wu <[email protected]> * [Bugfix] Remove triton do_bench fast_flush arg (vllm-project#16256) Signed-off-by: Kebe <[email protected]> * Update to transformers==4.51.1 (vllm-project#16257) Signed-off-by: Harry Mellor <[email protected]> * [New Model]: jinaai/jina-embeddings-v3 (vllm-project#16120) * [Misc] Avoid stripping meaningful whitespace from `nvidia-smi topo -m` output in collect_env.py (vllm-project#16272) Signed-off-by: imkero <[email protected]> * [Bugfix] Proper input validation for multi-modal encoder-decoder models (vllm-project#16156) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (vllm-project#15328) Signed-off-by: Isotr0py <[email protected]> * Add warning that content below line in template will be removed (vllm-project#16276) Signed-off-by: Harry Mellor <[email protected]> * [BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (vllm-project#16209) Signed-off-by: Lucas Wilkinson <[email protected]> * [Bugfix] fix deepseek fp16 scale bug (vllm-project#14809) Signed-off-by: Jinzhen Lin <[email protected]> Co-authored-by: mgoin <[email protected]> * [V1] Update structured output offline inference example (vllm-project#15721) Signed-off-by: Russell Bryant <[email protected]> * [CI/Build] Fix CI LoRA failure (vllm-project#16270) Signed-off-by: Jee Jee Li <[email protected]> * Add support to modelopt quantization of Mixtral model (vllm-project#15961) Signed-off-by: Yue <[email protected]> * [Model] Add smolvlm support (vllm-project#16017) Signed-off-by: chaunceyjiang <[email protected]> * [Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (vllm-project#16198) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: kliuae <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: kliuae <[email protected]> * [Bugfix] fix gettid method is not define (vllm-project#16084) Signed-off-by: rongfu.leng <[email protected]> * [Feature] Estimate max-model-len use available KV cache memory (vllm-project#16168) Signed-off-by: rongfu.leng <[email protected]> * [Core] Upgrade to xgrammar 0.1.18, add cache size limit (vllm-project#16283) Signed-off-by: Russell Bryant <[email protected]> * [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (vllm-project#16221) Signed-off-by: mgoin <[email protected]> * [TPU] Update PyTorch/XLA (vllm-project#16288) Signed-off-by: Chengji Yao <[email protected]> * [BugFix] Fix fusion test and add them to CI (vllm-project#16287) Signed-off-by: luka <[email protected]> * [Misc] Fix test_sharded_state_loader.py(vllm-project#16004) (vllm-project#16005) Signed-off-by: lvfei.lv <[email protected]> * [Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (vllm-project#16273) Signed-off-by: DarkLight1337 <[email protected]> * Update label-tpu mergify and remove removal bot (vllm-project#16298) * [BugFix] logger is not callable (vllm-project#16312) Signed-off-by: yihong0618 <[email protected]> * [BugFix] llama4 qknorm should be not shared across head (vllm-project#16311) Signed-off-by: Lu Fang <[email protected]> * update neuron config (vllm-project#16289) Signed-off-by: Ajay Vohra <[email protected]> * [BugFix] fix some typos found by typos. (vllm-project#16314) Signed-off-by: yihong0618 <[email protected]> * [Model] Add `SupportsMultiModal.get_language_model` interface (vllm-project#16007) Signed-off-by: NickLucche <[email protected]> * [Bugfix][Frontend] respect provided default guided decoding backend (vllm-project#15476) Signed-off-by: Guillaume Calmettes <[email protected]> * Revert "Update label-tpu mergify and remove removal bot" (vllm-project#16350) * [Bugfix] Fix profiling.py (vllm-project#16202) Signed-off-by: zh Wang <[email protected]> * [Bugfix] catch AssertionError in MistralTokenizer as ValueError (vllm-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]> * [CI]Fix hpu docker and numpy version for CI (vllm-project#16355) Signed-off-by: Chendi Xue <[email protected]> * Fix `benchmark_throughput.py --backend=hf` (vllm-project#16352) Signed-off-by: mgoin <[email protected]> * [Build/CI] Add tracing deps to vllm container image (vllm-project#15224) Signed-off-by: Russell Bryant <[email protected]> * [Hardware] add platform-specific request validation api (vllm-project#16291) Signed-off-by: Joe Runde <[email protected]> * [Misc] refactor Structured Outputs example (vllm-project#16322) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (vllm-project#16275) Signed-off-by: Chengji Yao <[email protected]> * Add GLM-4-0414 support (vllm-project#16338) Signed-off-by: lvfei.lv <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: yihong0618 <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Ajay Vohra <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Guillaume Calmettes <[email protected]> Co-authored-by: Accelerator1996 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: yihong <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: ajayvohra2005 <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> * [Bugfix]: do not shutdown server if `skip_special_use=False` for MistralTokenizer (vllm-project#14094) Signed-off-by: Guillaume Calmettes <[email protected]> * [Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (vllm-project#16325) Signed-off-by: Aaron Ang <[email protected]> * [TPU] Fix dummy loading OOM (vllm-project#16372) Signed-off-by: Chengji Yao <[email protected]> * [bugfix] Avoid the time consumption caused by creating dummy videos. (vllm-project#16371) * [CI][Bugfix] Pin triton version for CPU (vllm-project#16384) Signed-off-by: Roger Wang <[email protected]> * [misc] use tqdm.auto where appropriate (vllm-project#16290) Signed-off-by: Benjamin Kitor <[email protected]> * [Bugfix][TPU] Fix TPU validate_request (vllm-project#16369) Signed-off-by: Michael Goin <[email protected]> * fix sonnet dataset sample when prefix len is very small (vllm-project#16379) Signed-off-by: Chenyaaang <[email protected]> * [Model] use AutoWeightsLoader for deepseek_v2, internlm2 (vllm-project#16383) Signed-off-by: Aaron Ang <[email protected]> * [Misc] Update transformers version limits of multi-modal tests (vllm-project#16381) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix validation error for text-only Mllama 3.2 (vllm-project#16377) Signed-off-by: DarkLight1337 <[email protected]> * [Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (vllm-project#16038) Signed-off-by: mgoin <[email protected]> * [doc] add download model tips (vllm-project#16389) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Update Numba to 0.61.2 (vllm-project#16376) Signed-off-by: cyy <[email protected]> * [Model] Remove image mm limit for LLaMa4 (vllm-project#16365) Signed-off-by: Ye (Charlotte) Qi <[email protected]> * [doc] update the wrong link (vllm-project#16401) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [CI] Add auto update workflow for Dockerfile graph (vllm-project#11879) Signed-off-by: wineandchord <[email protected]> * Fix the torch version parsing logic (vllm-project#15857) * [VLM] Remove `BaseProcessingInfo.get_mm_max_tokens_per_item` (vllm-project#16408) Signed-off-by: DarkLight1337 <[email protected]> * [TPU][V1] Use `language_model` interface for getting text backbone in MM (vllm-project#16410) Signed-off-by: NickLucche <[email protected]> * Improve configs - `ParallelConfig` (vllm-project#16332) Signed-off-by: Harry Mellor <[email protected]> * [V1] Set structured output backend to `auto` by default (vllm-project#15724) Signed-off-by: Russell Bryant <[email protected]> * [V1][Spec Decode] Eagle Model loading (vllm-project#16035) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * [Bugfix] Fix bug when dataset is json (vllm-project#15899) Signed-off-by: Chenyaaang <[email protected]> * [Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (vllm-project#15423) Signed-off-by: Chih-Chieh-Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> * [V1] Zero-copy tensor/ndarray serialization/transmission (vllm-project#13790) Signed-off-by: Nick Hill <[email protected]> * [VLM] Avoid unnecessary dummy multimodal data during processing (vllm-project#16416) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix output token length check logic (vllm-project#16419) Signed-off-by: look <[email protected]> * [TPU][V1] Disable per-request seed/Generator (vllm-project#16172) Signed-off-by: NickLucche <[email protected]> * Fix range_ratio Bug in RandomDataset (vllm-project#16126) Signed-off-by: jadewang21 <[email protected]> * check input length of sonnet samples (vllm-project#16423) Signed-off-by: alexey-belyakov <[email protected]> * update benchmark_serving_structured_output to include auto backend (vllm-project#16438) Signed-off-by: Chenyaaang <[email protected]> * [Llama4] Enable attention temperature tuning by default for long context (>32k) (vllm-project#16439) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> * Update supported_hardware.md for TPU INT8 (vllm-project#16437) * [Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (vllm-project#16424) Signed-off-by: Isotr0py <[email protected]> * [CPU][Bugfix] Fix CPU docker issues (vllm-project#16454) Signed-off-by: jiang.li <[email protected]> * [Bugfix] Don't set an upper bound on repetition penalty (vllm-project#16403) Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Nick Hill <[email protected]> * Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (vllm-project#16453) * [Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (vllm-project#15990) Signed-off-by: Jee Jee Li <[email protected]> * Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (vllm-project#16447) Signed-off-by: mgoin <[email protected]> * [Misc] Raise error for V1 not supporting Long LoRA. (vllm-project#16415) Signed-off-by: Jee Jee Li <[email protected]> * [Misc] update api_client example (vllm-project#16459) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Don't install triton on `ppc64le` platform (vllm-project#16470) Signed-off-by: Harry Mellor <[email protected]> * [Kernel] support merge_attn_states CUDA kernel, 3x speedup (vllm-project#16173) Signed-off-by: DefTruth <[email protected]> * [Bugfix] Fix bugs of running Quark quantized models (vllm-project#16236) Signed-off-by: chaow <[email protected]> * [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (vllm-project#12779) Signed-off-by: Tomasz Zielinski <[email protected]> * Fix erroneous "model doesn't support compile" warning (vllm-project#16486) Signed-off-by: rzou <[email protected]> * [TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (vllm-project#16483) Signed-off-by: NickLucche <[email protected]> * [Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (vllm-project#16366) Signed-off-by: mgoin <[email protected]> * [Doc] Document InternVL3 support (vllm-project#16495) Signed-off-by: Isotr0py <[email protected]> * [Bugfix] handle alignment of encoder_seq_lens in mllama.py (vllm-project#14784) Signed-off-by: Travis Johnson <[email protected]> * Improve configs - `LoadConfig` (vllm-project#16422) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Added chat templates for LLaMa4 pythonic tool calling (vllm-project#16463) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Kai Wu <[email protected]> * [Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (vllm-project#16488) * Update openai_compatible_server.md (vllm-project#16507) Signed-off-by: Christian Sears <[email protected]> * [Bugfix] clean up duplicated code (vllm-project#16485) Signed-off-by: Gogs <[email protected]> Co-authored-by: Gogs <[email protected]> * Bugfix for PixtralHF models without spatial_merge_size (vllm-project#16513) Signed-off-by: mgoin <[email protected]> * [Doc] Fix link to vLLM blog (vllm-project#16519) Signed-off-by: Yuan Tang <[email protected]> * [CI][Bugfix] Add mistral_tool_use to Ci (vllm-project#16517) Signed-off-by: mgoin <[email protected]> * [BugFix] Handle non-contiguous tensors properly when serializing (vllm-project#16492) Signed-off-by: Nick Hill <[email protected]> * [Doc] Update Llama4 Model Names in Supported Models (vllm-project#16509) Signed-off-by: Ye (Charlotte) Qi <[email protected]> * Optimized topk for topk=1 (Llama-4) (vllm-project#16512) Signed-off-by: mgoin <[email protected]> * [Feature][V1] Add xgrammar to support minLength, maxLength with test (vllm-project#16516) Signed-off-by: Leon Seidel <[email protected]> * [Frontend] support matryoshka representation / support embedding API dimensions (vllm-project#16331) * fix: spelling (vllm-project#16466) Signed-off-by: Tianer Zhou <[email protected]> * [Misc] Update chat utils tests (vllm-project#16520) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Openai transcription client example use same Whisper model (vllm-project#16487) Signed-off-by: NickLucche <[email protected]> * [V1] Enable multi-input by default (vllm-project#15799) Signed-off-by: DarkLight1337 <[email protected]> * [MISC] Make GroupCoordinator compatible with out-of-tree devices (vllm-project#16464) Signed-off-by: [email protected] <[email protected]> * [Misc] Delete redundant code (vllm-project#16530) Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> * Fix syntaxWarning: invalid escape sequence '\s' (vllm-project#16532) Signed-off-by: Jie Fu <[email protected]> * [Perf] Optimize Preparing Inputs for GPU Model Runner (vllm-project#16484) Signed-off-by: snowcharm <[email protected]> Co-authored-by: Nick Hill <[email protected]> * [Bugfix] Validate logit biases to prevent out of vocab ids crashing engine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> * [V1][Spec Decode] KV cache slots for eagle heads (vllm-project#16370) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) (vllm-project#16537) Signed-off-by: mgoin <[email protected]> * [Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py (vllm-project#16556) * [Core][V0] Enable regex support with xgrammar (vllm-project#13228) Signed-off-by: Russell Bryant <[email protected]> * capture only SP * batch_size <= max_batch_size case to cover small max_batch_size --------- Signed-off-by: Brayden Zhong <[email protected]> Signed-off-by: Thien Tran <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: chun37 <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: wangli <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Chris Thi <[email protected]> Signed-off-by: lukas.bluebaum <[email protected]> Signed-off-by: Eric <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Kay Yan <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Matt, Matthias <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Nishidha Panpaliya <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Hyesoo Yang <[email protected]> Signed-off-by: Chengji Yao <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: root <[email protected]> Signed-off-by: yihong0618 <[email protected]> Signed-off-by: Ziji Shi <[email protected]> Signed-off-by: StevenShi-23 <[email protected]> Signed-off-by: wwl2755 <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Bill Nell <[email protected]> Signed-off-by: Alexei V. Ivanov <[email protected]> Signed-off-by: Xiongfei Wei <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Jonghyun Choe <[email protected]> Signed-off-by: zhenwei <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: ilmarkov <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: kevin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Michael Goin <[email protected]> Signed-off-by: Tristan Leclercq <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Ben Jackson <[email protected]> Signed-off-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: paolovic <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: YamPengLi <[email protected]> Signed-off-by: WangErXiao <[email protected]> Signed-off-by: Aston Zhang <[email protected]> Signed-off-by: drisspg <[email protected]> Signed-off-by: Jon Swenson <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Xiaodong Wang <[email protected]> Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Leon Seidel <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Miles Williams <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Kebe <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Tianyuan Wu <[email protected]> Signed-off-by: imkero <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Yue <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: kliuae <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: lvfei.lv <[email protected]> Signed-off-by: Ajay Vohra <[email protected]> Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: zh Wang <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]> Signed-off-by: Aaron Ang <[email protected]> Signed-off-by: Benjamin Kitor <[email protected]> Signed-off-by: Chenyaaang <[email protected]> Signed-off-by: cyy <[email protected]> Signed-off-by: wineandchord <[email protected]> Signed-off-by: LiuXiaoxuanPKU <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]> Signed-off-by: look <[email protected]> Signed-off-by: jadewang21 <[email protected]> Signed-off-by: alexey-belyakov <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: DefTruth <[email protected]> Signed-off-by: chaow <[email protected]> Signed-off-by: Tomasz Zielinski <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Christian Sears <[email protected]> Signed-off-by: Gogs <[email protected]> Signed-off-by: Yuan Tang <[email protected]> Signed-off-by: Tianer Zhou <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Jie Fu <[email protected]> Signed-off-by: snowcharm <[email protected]> Signed-off-by: Ryan McConville <[email protected]> Co-authored-by: Brayden Zhong <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: chun <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Li Wang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Chris Thi <[email protected]> Co-authored-by: LukasBluebaum <[email protected]> Co-authored-by: Eric Tang <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Kay Yan <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Matthias Matt <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Nishidha <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Hyesoo Yang <[email protected]> Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal> Co-authored-by: Chengji Yao <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: yihong <[email protected]> Co-authored-by: Ziji Shi (Steven) <[email protected]> Co-authored-by: wwl2755 <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: yarongmu-google <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: iefgnoix <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Jonghyun Choe <[email protected]> Co-authored-by: liuzhenwei <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Ilya Markov <[email protected]> Co-authored-by: ilmarkov <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Tristan Leclercq <[email protected]> Co-authored-by: Jinzhen Lin <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Ben Jackson <[email protected]> Co-authored-by: Paul Schweigert <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: paolovic <[email protected]> Co-authored-by: Martin Hoyer <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Robin <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: leon-seidel <[email protected]> Co-authored-by: Driss Guessous <[email protected]> Co-authored-by: Miles Williams <[email protected]> Co-authored-by: Satyajith Chilappagari <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: zxfan-cpu <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: Kebe <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: TY-AMD <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Kero Liang <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yueshen2016 <[email protected]> Co-authored-by: TJian <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: kliuae <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Accelerator1996 <[email protected]> Co-authored-by: ajayvohra2005 <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: zh Wang <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Yuxuan Zhang <[email protected]> Co-authored-by: Aaron Ang <[email protected]> Co-authored-by: Jintao <[email protected]> Co-authored-by: Benjamin Kitor <[email protected]> Co-authored-by: Chenyaaang <[email protected]> Co-authored-by: cyyever <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: wineandchord <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: look <[email protected]> Co-authored-by: WWW <[email protected]> Co-authored-by: Alexey Belyakov <[email protected]> Co-authored-by: DefTruth <[email protected]> Co-authored-by: chaow-amd <[email protected]> Co-authored-by: Tomasz Zielinski <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Christian Sears <[email protected]> Co-authored-by: Gogs <[email protected]> Co-authored-by: Yuan Tang <[email protected]> Co-authored-by: Tianer Zhou <[email protected]> Co-authored-by: Huazhong Ji <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: SnowCharm <[email protected]> Co-authored-by: Ryan McConville <[email protected]>
…-project#16344) Signed-off-by: Guillaume Calmettes <[email protected]> Signed-off-by: Mu Huai <[email protected]>
When the
MistralTokenizer
is used, input preprocessing is delegated tomistral-common
which usesassert
statements to stop the process if the received input is not conformed to what is expected.Currently
AssertionError
are not catched when inputs are preprocessed in vLLM and therefore sending an incorrectly formatted message array to a model relying on the MistralTokenizer will result in a500 Internal Server Error
error.This PR convert
AssertionError
from the MistralTokenizer to ValueError, so they can be properly caughtassertion error result in
500 Internal Server Error
: