refactor: update Live API configuration, transcription support, and i…#1224
refactor: update Live API configuration, transcription support, and i…#1224thorwebdev wants to merge 1 commit into
Conversation
…nput handling for model version gemini-3.1-flash-live-preview
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly updates the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the Live API quickstart to use the google-genai SDK (v1.0.0+), involving model and API key updates, enabling audio transcriptions, and refactoring the audio reception logic. The review feedback highlights several necessary corrections for SDK compatibility, such as the proper usage of send_realtime_input and send_client_content. Additionally, improvements were suggested regarding the repository's style guide for model selection, the use of constants for sample rates, and the handling of incremental transcription updates.
| turns=types.Content(parts=[types.Part(text=text or "")]), | ||
| turn_complete=True, | ||
| ) | ||
| await self.session.send_realtime_input(text=text) |
There was a problem hiding this comment.
The send_realtime_input method is designed for streaming media chunks (blobs). To send text input in a multimodal live session, you should use send_client_content within a turn.
| await self.session.send_realtime_input(text=text) | |
| await self.session.send_client_content( | |
| turns=[types.Content(role="user", parts=[types.Part(text=text)])], | |
| turn_complete=True | |
| ) |
There was a problem hiding this comment.
You are wrong! https://ai.google.dev/gemini-api/docs/live-api/capabilities#send-text
@markmcd can we give gemini-code-assist access to the skills?
There was a problem hiding this comment.
You are correct, send_realtime_input is indeed the appropriate method for sending text in the Live API when using the gemini-3.1-flash-live-preview model. My previous comment was based on an older interface pattern. Thank you for the correction.
There was a problem hiding this comment.
ack :) great idea, it doesn't look like it's possible without copying it across but i'll check with the GCA folks.
| msg = await self.out_queue.get() | ||
| blob = types.Blob(data=msg["data"], mime_type=msg["mime_type"]) | ||
| if msg["mime_type"].startswith("audio/"): | ||
| await self.session.send_realtime_input(audio=msg) | ||
| await self.session.send_realtime_input(audio=blob) | ||
| else: | ||
| await self.session.send_realtime_input(media=msg) | ||
| # Use video= (not the deprecated media=) for image/video frames. | ||
| await self.session.send_realtime_input(video=blob) |
There was a problem hiding this comment.
In the google-genai SDK (v1.0.0+), the send_realtime_input method expects a media_chunks argument containing a list of Blob objects. It does not support audio or video as direct keyword arguments.
msg = await self.out_queue.get()
blob = types.Blob(data=msg["data"], mime_type=msg["mime_type"])
await self.session.send_realtime_input(media_chunks=[blob])There was a problem hiding this comment.
wrong again, media_chunks has been deprecated.
|
|
||
| # --- Model Configuration --- | ||
| MODEL = "models/gemini-2.5-flash-native-audio-preview-12-2025" | ||
| MODEL = "gemini-3.1-flash-live-preview" |
There was a problem hiding this comment.
According to the repository style guide (line 72), model selection should use a Colab selector for better maintainability and to allow users to easily switch models in supported environments.
| MODEL = "gemini-3.1-flash-live-preview" | |
| MODEL = "gemini-3.1-flash-live-preview" # @param ["gemini-3.1-flash-live-preview", "gemini-2.0-flash-live-001"] {"allow-input":true, "isTemplate": true} |
References
- When selecting a model, use a colab selector for easier maintainability. (link)
| payload = { | ||
| "data": data, | ||
| "mime_type": "audio/pcm" | ||
| "mime_type": "audio/pcm;rate=16000" |
There was a problem hiding this comment.
It is recommended to use the SEND_SAMPLE_RATE constant instead of a hardcoded value in the MIME type string. This ensures that if the sample rate is updated in the configuration, it is automatically reflected here.
| "mime_type": "audio/pcm;rate=16000" | |
| "mime_type": f"audio/pcm;rate={SEND_SAMPLE_RATE}" |
| if server_content.input_transcription: | ||
| print(f"\nYou: {server_content.input_transcription.text}", end="") | ||
|
|
||
| if server_content.output_transcription: | ||
| print(f"\nGemini: {server_content.output_transcription.text}", end="") |
There was a problem hiding this comment.
The Live API sends incremental transcription updates. Printing every update results in duplicate and fragmented text in the console. It is better to check the is_final attribute to only print the completed transcription.
| if server_content.input_transcription: | |
| print(f"\nYou: {server_content.input_transcription.text}", end="") | |
| if server_content.output_transcription: | |
| print(f"\nGemini: {server_content.output_transcription.text}", end="") | |
| if server_content.input_transcription and server_content.input_transcription.is_final: | |
| print(f"\nYou: {server_content.input_transcription.text}") | |
| if server_content.output_transcription and server_content.output_transcription.is_final: | |
| print(f"\nGemini: {server_content.output_transcription.text}") |
…nput handling for model version gemini-3.1-flash-live-preview
fixes #1108