-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Description
Your current environment
environment:
- TORCHDYNAMO_DISABLE=1
- GPU_MEMORY_UTILIZATION=0.94
- NCCL_P2P_DISABLE=1
🐛 Describe the bug
I found that another user reported back in May that bad_words is ineffective for n > 1.
#18767 (comment)
Running model Qwen3 235B A22B Instruct 2507,
I run a VLLM request to v1/chat/completions as:
{ "n": 3, "bad_words": [ "Mmm" ], "seed": 1588207426, "model": "Big-Q", "return_token_ids": true, "top_k": 20, "messages": [ { "role": "system", "content": "<Redacted Message History>" } ], "temperature": 0.7, "presence_penalty": 0, "frequency_penalty": 0, "repetition_penalty": 1, "add_prefix_space": true, "add_special_tokens": false }
However, the generated response, also contains "Mmm". Initially I suspected the tokenizer finding a different way to come up with the word "Mmm", however I verified with:
""return_token_ids": true," in my request,
Tokens returned were: "44", "3821", "11",
Then I did a call to /tokenizer with "Mmm", which also returns "44", "3821", "11", ruling out the suspected tokenizer issue.
Since I have had a different but similar problem with high n count on an old version of VLLM before, my instinct was to try n = 1, and then bad_words worked right away:
"Mmm" disappeared and was replaced with "Mm", and then I banned that without issue as well, then another variation and so on came up, so it worked without any issue with n = 1.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.