Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] [Inference] beta.5 - audio input, developer role #33359

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

glharper
Copy link
Member

Packages impacted by this PR

@azure-rest/ai-inference

Issues associated with this PR

Describe the problem that is addressed by this PR

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

Are there test cases added in this PR? (If not, why?)

Provide a list of related PRs (if any)

Command used to generate this PR:**(Applicable only to SDK release request PRs)

Checklists

  • Added impacted package name to the issue description
  • Does this PR needs any fixes in the SDK Generator?** (If so, create an Issue in the Autorest/typescript repository and link it here)
  • Added a changelog (if necessary)

@Copilot Copilot bot review requested due to automatic review settings March 12, 2025 15:22
@glharper glharper requested review from bterlson, a team, dargilco and jhakulin as code owners March 12, 2025 15:22
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR introduces audio input support for chat completions along with a new "developer" chat role. Key changes include adding new TypeScript and JavaScript samples for handling audio data and audio URLs, updating API interfaces and model definitions to support developer messages and audio content, and marking several auto‐generated files with update warnings.

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts Added sample to demonstrate chat completions using audio data.
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioUrlChatCompletion.ts Added sample to demonstrate chat completions using an audio URL.
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioDataChatCompletion.js Added equivalent JavaScript sample for chat completions using audio data.
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioUrlChatCompletion.js Added equivalent JavaScript sample for chat completions using an audio URL.
sdk/ai/ai-inference-rest/src/*.ts and review/ai-inference.api.md Updated various SDK files and API definitions to support the new features.
sdk/ai/ai-inference-rest/CHANGELOG.md Updated changelog with beta.6 changes and feature details.

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

@azure-rest/ai-inference


const client = createModelClient();

const data = getAudioData(audioFilePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stream the file to the server instead? similar to

file: createReadStream(audioFilePath),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using createReadStream, I'm seeing "Invalid type for 'messages[1].content[1].input_audio.data': expected a base64-encoded data string, but got an object instead."

@trangevi is there planned support for sending read streams for audio data completions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the code, the method itself doesn't support streams, it instead just serializes the stream object as is to the server which is not what we want. @joheredi, @timovv Is there a way to add streaming support to the input but deeper into the request body object?


/**
* Demonstrates how to get chat completions using audio data.
* NOTE: Audio data completions currently work only with GPT audio models.
Copy link
Member

@jhakulin jhakulin Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the limitation in OpenAI audio models only?


/**
* Demonstrates how to get chat completions using an audio URL.
* NOTE: Audio URL completions currently work only with Phi multimodal models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a document in ms learn for limitations or maybe README? Would be better to refer to document than maintain the information here.

NOTE: Audio URL completions currently work only with Phi multimodal models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants