Just looking in to this, but wanted to report it in case it was a known issue or someone has more information.
While running guidellm against a locally running vllm serve, I am seeing a very large amount of these log messages in the vLLM output:
WARNING 06-27 14:57:36 [protocol.py:58] The following fields were present in the request but ignored: {'max_completion_tokens'}
Running a request manually against the endpoint is happy, with no errors in the vllm logs:
curl -k -s -H "Content-Type: application/json" http://localhost:8000/v1/chat/completions -d '{"model":"llama3.1-8b-instruct","messages":[{"role":"user","content":"What is an AI tensorized weight?"}],"max_completion_tokens":35}' | jq .
{
"id": "chatcmpl-e00ce83f-121f-482a-b43c-5c4494ad29ae",
"object": "chat.completion",
"created": 1751037186,
"model": "llama3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "A tensorized weight, also known as a tensor weight or a weight tensor, is a type of weight used in artificial neural networks (ANNs) and deep learning models.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 43,
"total_tokens": 78,
"completion_tokens": 35,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
Which leads me to believe the prompt being formed by guidellm must be placing max_completion_tokens somewhere other than as a top level property of the request struct.
Just looking in to this, but wanted to report it in case it was a known issue or someone has more information.
While running guidellm against a locally running
vllm serve, I am seeing a very large amount of these log messages in the vLLM output:WARNING 06-27 14:57:36 [protocol.py:58] The following fields were present in the request but ignored: {'max_completion_tokens'}Running a request manually against the endpoint is happy, with no errors in the vllm logs:
{ "id": "chatcmpl-e00ce83f-121f-482a-b43c-5c4494ad29ae", "object": "chat.completion", "created": 1751037186, "model": "llama3.1-8b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "reasoning_content": null, "content": "A tensorized weight, also known as a tensor weight or a weight tensor, is a type of weight used in artificial neural networks (ANNs) and deep learning models.", "tool_calls": [] }, "logprobs": null, "finish_reason": "length", "stop_reason": null } ], "usage": { "prompt_tokens": 43, "total_tokens": 78, "completion_tokens": 35, "prompt_tokens_details": null }, "prompt_logprobs": null }Which leads me to believe the prompt being formed by guidellm must be placing
max_completion_tokenssomewhere other than as a top level property of the request struct.