-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI] [Inference] beta.5 - audio input, developer role #33359
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR introduces audio input support for chat completions along with a new "developer" chat role. Key changes include adding new TypeScript and JavaScript samples for handling audio data and audio URLs, updating API interfaces and model definitions to support developer messages and audio content, and marking several auto‐generated files with update warnings.
Reviewed Changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts | Added sample to demonstrate chat completions using audio data. |
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioUrlChatCompletion.ts | Added sample to demonstrate chat completions using an audio URL. |
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioDataChatCompletion.js | Added equivalent JavaScript sample for chat completions using audio data. |
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioUrlChatCompletion.js | Added equivalent JavaScript sample for chat completions using an audio URL. |
sdk/ai/ai-inference-rest/src/*.ts and review/ai-inference.api.md | Updated various SDK files and API definitions to support the new features. |
sdk/ai/ai-inference-rest/CHANGELOG.md | Updated changelog with beta.6 changes and feature details. |
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts
Outdated
Show resolved
Hide resolved
API change check APIView has identified API level changes in this PR and created following API reviews. |
|
||
const client = createModelClient(); | ||
|
||
const data = getAudioData(audioFilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stream the file to the server instead? similar to
file: createReadStream(audioFilePath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using createReadStream, I'm seeing "Invalid type for 'messages[1].content[1].input_audio.data': expected a base64-encoded data string, but got an object instead."
@trangevi is there planned support for sending read streams for audio data completions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ataChatCompletion.ts Co-authored-by: Copilot <[email protected]>
|
||
/** | ||
* Demonstrates how to get chat completions using audio data. | ||
* NOTE: Audio data completions currently work only with GPT audio models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the limitation in OpenAI audio models only?
|
||
/** | ||
* Demonstrates how to get chat completions using an audio URL. | ||
* NOTE: Audio URL completions currently work only with Phi multimodal models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a document in ms learn for limitations or maybe README? Would be better to refer to document than maintain the information here.
NOTE: Audio URL completions currently work only with Phi multimodal models.
Packages impacted by this PR
@azure-rest/ai-inference
Issues associated with this PR
Describe the problem that is addressed by this PR
What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?
Are there test cases added in this PR? (If not, why?)
Provide a list of related PRs (if any)
Command used to generate this PR:**(Applicable only to SDK release request PRs)
Checklists