.Net Bug: Function calling does not work properly with Llama series models when hosted on Azure. #10221
Closed
Description
Describe the bug
I deployed 3 different models in Azure AI (Machine Learning Studio) serverless endpoint.
- Mistral-Nemo
- Llama-3.3-70B-Instruct
- Llama-3.2-90B-Vision-Instruct
And use SemanticKernel package 1.33.0 + Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta to test, The Mistral-Nemo model works perfect. But not in llama series models.
To Reproduce
Steps to reproduce the behavior:
- Depoly open source model (e.g llama-3.3-70B) to serverless endpoin.
- Make a GetChatMessageContentsAsync call with tools
Expected behavior
Models like Llama 3.3 and Llama 3.2 should support tool calling and are expected to function as they do when hosted in Ollama.
Platform
- OS: Mac
- IDE: Rider
- Language: C#
- Source: SK 1.33.0, Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta
Additional context
With llama-3.3-70B, When make a chatcomplecton call with tools, It responded with an incorrect result and not invoke any tools. no matter use GetChatMessageContentsAsync or GetStreamingChatMessageContentsAsync.
With llama-3.2-90B, When make a chatcomplecton call with tools, It throws an exception right away (error message in bwlow)
Azure.RequestFailedException: {"object":"error","message":"\"auto\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set","type":"BadRequestError","param":null,"code":400}
Status: 400 (Bad Request)
ErrorCode: Bad Request
Content:
{"error":{"code":"Bad Request","message":"{\"object\":\"error\",\"message\":\"\\\"auto\\\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set\",\"type\":\"BadRequestError\",\"param\":null,\"code\":400}","status":400}}
Headers:
x-ms-rai-invoked: REDACTED
x-envoy-upstream-service-time: REDACTED
X-Request-ID: REDACTED
ms-azureml-model-error-reason: REDACTED
ms-azureml-model-error-statuscode: REDACTED
ms-azureml-model-time: REDACTED
azureml-destination-model-group: REDACTED
azureml-destination-region: REDACTED
azureml-destination-deployment: REDACTED
azureml-destination-endpoint: REDACTED
x-ms-client-request-id: 42bc2161-5a3f-4e88-8c9e-577390db941e
Request-Context: REDACTED
azureml-model-session: REDACTED
azureml-model-group: REDACTED
Date: Wed, 15 Jan 2025 05:46:51 GMT
Content-Length: 246
Content-Type: application/json
at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(RequestContent content, String extraParams, RequestContext context)
at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(ChatCompletionsOptions chatCompletionsOptions, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.AzureAIInferenceChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.FunctionInvokingChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
at Microsoft.SemanticKernel.ChatCompletion.ChatClientChatCompletionService.GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
Metadata
Assignees
Type
Projects
Status
Sprint: Done