[AI] [Inference] beta.5 - audio input, developer role #33359

glharper · 2025-03-12T15:22:17Z

Packages impacted by this PR

@azure-rest/ai-inference

Issues associated with this PR

Describe the problem that is addressed by this PR

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

Are there test cases added in this PR? (If not, why?)

Provide a list of related PRs (if any)

Command used to generate this PR:***(Applicable only to SDK release request PRs)*

Checklists

Added impacted package name to the issue description
Does this PR needs any fixes in the SDK Generator?** (If so, create an Issue in the Autorest/typescript repository and link it here)
Added a changelog (if necessary)

Copilot

Pull Request Overview

The PR introduces audio input support for chat completions along with a new "developer" chat role. Key changes include adding new TypeScript and JavaScript samples for handling audio data and audio URLs, updating API interfaces and model definitions to support developer messages and audio content, and marking several auto‐generated files with update warnings.

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts	Added sample to demonstrate chat completions using audio data.
sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioUrlChatCompletion.ts	Added sample to demonstrate chat completions using an audio URL.
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioDataChatCompletion.js	Added equivalent JavaScript sample for chat completions using audio data.
sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioUrlChatCompletion.js	Added equivalent JavaScript sample for chat completions using an audio URL.
sdk/ai/ai-inference-rest/src/*.ts and review/ai-inference.api.md	Updated various SDK files and API definitions to support the new features.
sdk/ai/ai-inference-rest/CHANGELOG.md	Updated changelog with beta.6 changes and feature details.

sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts

azure-sdk · 2025-03-12T15:32:59Z

API change check

APIView has identified API level changes in this PR and created following API reviews.

@azure-rest/ai-inference

deyaaeldeen · 2025-03-12T16:32:25Z

sdk/ai/ai-inference-rest/samples-dev/audioDataChatCompletion.ts

+
+  const client = createModelClient();
+
+  const data = getAudioData(audioFilePath);


Can we stream the file to the server instead? similar to

azure-sdk-for-js/sdk/openai/openai/samples-dev/audioTranslation.ts

Line 33 in 8678812

file: createReadStream(audioFilePath),

using createReadStream, I'm seeing "Invalid type for 'messages[1].content[1].input_audio.data': expected a base64-encoded data string, but got an object instead."

@trangevi is there planned support for sending read streams for audio data completions?

Looking into the code, the method itself doesn't support streams, it instead just serializes the stream object as is to the server which is not what we want. @joheredi, @timovv Is there a way to add streaming support to the input but deeper into the request body object?

…ataChatCompletion.ts Co-authored-by: Copilot <[email protected]>

jhakulin · 2025-03-13T01:35:47Z

sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioDataChatCompletion.js

+
+/**
+ * Demonstrates how to get chat completions using audio data.
+ * NOTE: Audio data completions currently work only with GPT audio models.


Is the limitation in OpenAI audio models only?

jhakulin · 2025-03-13T01:38:00Z

sdk/ai/ai-inference-rest/samples/v1-beta/javascript/audioUrlChatCompletion.js

+
+/**
+ * Demonstrates how to get chat completions using an audio URL.
+ * NOTE: Audio URL completions currently work only with Phi multimodal models.


Is there a document in ms learn for limitations or maybe README? Would be better to refer to document than maintain the information here.

NOTE: Audio URL completions currently work only with Phi multimodal models.

glharper added 8 commits March 10, 2025 10:57

[AI] [Inference] initial addition of developer role

2e36a75

merged src from generated code

518e8f0

update api markdown

a4473cb

use correct audio models

bda65ac

update api ref markdown

fa9f9ac

add audio URL chat completion sample

5cb6549

add example mp3 for audio data sample

e60db58

add audio data chat completions sample

be064f4

Copilot bot review requested due to automatic review settings March 12, 2025 15:22

glharper requested review from bterlson, a team, dargilco and jhakulin as code owners March 12, 2025 15:22

Copilot AI reviewed Mar 12, 2025

View reviewed changes

sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioDataChatCompletion.ts Outdated Show resolved Hide resolved

deyaaeldeen reviewed Mar 12, 2025

View reviewed changes

Update sdk/ai/ai-inference-rest/samples/v1-beta/typescript/src/audioD…

c65b2be

…ataChatCompletion.ts Co-authored-by: Copilot <[email protected]>

jhakulin reviewed Mar 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] [Inference] beta.5 - audio input, developer role #33359

[AI] [Inference] beta.5 - audio input, developer role #33359

glharper commented Mar 12, 2025

Copilot AI left a comment

azure-sdk commented Mar 12, 2025

deyaaeldeen Mar 12, 2025

glharper Mar 12, 2025

deyaaeldeen Mar 12, 2025

jhakulin Mar 13, 2025 •

edited

Loading

jhakulin Mar 13, 2025


		const client = createModelClient();

		const data = getAudioData(audioFilePath);

[AI] [Inference] beta.5 - audio input, developer role #33359

Are you sure you want to change the base?

[AI] [Inference] beta.5 - audio input, developer role #33359

Conversation

glharper commented Mar 12, 2025

Packages impacted by this PR

Issues associated with this PR

Describe the problem that is addressed by this PR

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

Are there test cases added in this PR? (If not, why?)

Provide a list of related PRs (if any)

Command used to generate this PR:**(Applicable only to SDK release request PRs)

Checklists

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

azure-sdk commented Mar 12, 2025

deyaaeldeen Mar 12, 2025

Choose a reason for hiding this comment

glharper Mar 12, 2025

Choose a reason for hiding this comment

deyaaeldeen Mar 12, 2025

Choose a reason for hiding this comment

jhakulin Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

jhakulin Mar 13, 2025

Choose a reason for hiding this comment

Command used to generate this PR:***(Applicable only to SDK release request PRs)*

jhakulin Mar 13, 2025 •

edited

Loading