-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Hello,
I must point out that I was unable to see whether the problem came from DJL directly or from vLLM. Not having the ability to simply host vLLM on a single instance in short order. So maybe this issue would be more associated with vLLM directly.
I'm waiting for the repo admins to get back to me on what would be best.
Description
Having hosted a Qwen 3 32b FP8 model thanks to LMI v15, I've noticed differences between function calls made with named function calling and those made with automatic function calling.
Indeed, I noticed that in the “delta” section, ‘tool_calls’, as soon as the function call is “named” (where we force the use of a function call with a specific function), the "id" is missing in the section where the chunk generates the function to call.
Expected Behavior
Have the Id present in all required function calls, auto or named.
Error Message
None with respect to boto3 or AWS instances, but it does prevent implementations with frameworks that strictly follow the OpenAI API schema, such as LangChain/LangGraph.
However, the traces I'm going to give will make it possible to see the difference there can be between the two calls:
Forced function calling :
{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"my_function_to_call","arguments":"{\\n"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"my_function_to_call","arguments":" \\""}}]},"logprobs":null,"finish_reason":null}]}Automatic function calling :
{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"id":"chatcmpl-tool-570b82d7ca9e40f894258ec87f7e5a9f","type":"function","index":0,"function":{"name":"my_function_to_call"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\\"query\\": \\""}}]},"logprobs":null,"finish_reason":null}]}How to Reproduce?
Fill in your custom, query, endpoint and inference component function calls.
import boto3
smr_client = boto3.client('sagemaker-runtime')
payload_forced = {'EndpointName': 'MyEndpoint', 'Body': '{"stream": true, "do_sample": false, "chat_template_kwargs": {"enable_thinking": false}, "tools": [...], "tool_choice": {"type": "function", "function": {"name": "my_function_name"}}, "messages": [{"content": "You are a helpful assistant.", "role": "system"}, {"content": "my query", "role": "user"}], "max_tokens": 8192}', 'ContentType': 'application/json', 'InferenceComponentName': 'MyIC'}
payload_auto = {'EndpointName': 'MyEndpoint', 'Body': '{"stream": true, "do_sample": false, "chat_template_kwargs": {"enable_thinking": false}, "tools": [...], "messages": [{"content": "You are a helpful assistant.", "role": "system"}, {"content": "my query", "role": "user"}], "max_tokens": 8192}', 'ContentType': 'application/json', 'InferenceComponentName': 'MyIC'}
for chunk in smr_client.invoke_endpoint_with_response_stream(**payload_forced)['Body']:
print(chunk)
for chunk in smr_client.invoke_endpoint_with_response_stream(**payload_auto)['Body']:
print(chunk)What have you tried to solve it?
To ensure strict compliance with OpenAI API formats, I post-process chunk generation so that when it first detects a function name, it fills in a random Id.