Skip to content

Difference in output between automatic and named function calls #2851

@FlowlionAI

Description

@FlowlionAI

Hello,

I must point out that I was unable to see whether the problem came from DJL directly or from vLLM. Not having the ability to simply host vLLM on a single instance in short order. So maybe this issue would be more associated with vLLM directly.
I'm waiting for the repo admins to get back to me on what would be best.

Description

Having hosted a Qwen 3 32b FP8 model thanks to LMI v15, I've noticed differences between function calls made with named function calling and those made with automatic function calling.

Indeed, I noticed that in the “delta” section, ‘tool_calls’, as soon as the function call is “named” (where we force the use of a function call with a specific function), the "id" is missing in the section where the chunk generates the function to call.

Expected Behavior

Have the Id present in all required function calls, auto or named.

Error Message

None with respect to boto3 or AWS instances, but it does prevent implementations with frameworks that strictly follow the OpenAI API schema, such as LangChain/LangGraph.
However, the traces I'm going to give will make it possible to see the difference there can be between the two calls:

Forced function calling :

{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"my_function_to_call","arguments":"{\\n"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-3334a45d38ec4d39812a826bf240c0f4","object":"chat.completion.chunk","created":1750863219,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"name":"my_function_to_call","arguments":" \\""}}]},"logprobs":null,"finish_reason":null}]}

Automatic function calling :

{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"id":"chatcmpl-tool-570b82d7ca9e40f894258ec87f7e5a9f","type":"function","index":0,"function":{"name":"my_function_to_call"}}]},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-5f21231f32ba484c8e9c6b0cf7467e80","object":"chat.completion.chunk","created":1750863437,"model":"lmi","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\\"query\\": \\""}}]},"logprobs":null,"finish_reason":null}]}

How to Reproduce?

Fill in your custom, query, endpoint and inference component function calls.

import boto3
smr_client = boto3.client('sagemaker-runtime')

payload_forced = {'EndpointName': 'MyEndpoint', 'Body': '{"stream": true, "do_sample": false, "chat_template_kwargs": {"enable_thinking": false}, "tools": [...], "tool_choice": {"type": "function", "function": {"name": "my_function_name"}}, "messages": [{"content": "You are a helpful assistant.", "role": "system"}, {"content": "my query", "role": "user"}], "max_tokens": 8192}', 'ContentType': 'application/json', 'InferenceComponentName': 'MyIC'}

payload_auto = {'EndpointName': 'MyEndpoint', 'Body': '{"stream": true, "do_sample": false, "chat_template_kwargs": {"enable_thinking": false}, "tools": [...], "messages": [{"content": "You are a helpful assistant.", "role": "system"}, {"content": "my query", "role": "user"}], "max_tokens": 8192}', 'ContentType': 'application/json', 'InferenceComponentName': 'MyIC'}

for chunk in smr_client.invoke_endpoint_with_response_stream(**payload_forced)['Body']:
    print(chunk)

for chunk in smr_client.invoke_endpoint_with_response_stream(**payload_auto)['Body']:
    print(chunk)

What have you tried to solve it?

To ensure strict compliance with OpenAI API formats, I post-process chunk generation so that when it first detects a function name, it fills in a random Id.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions