fix(config): use flash-lite for utility model configs to preserve quota#25684
fix(config): use flash-lite for utility model configs to preserve quota#25684kazukinakai wants to merge 5 commits intogoogle-gemini:mainfrom
Conversation
Loop detection, LLM edit fixer, and next-speaker checker were hardcoded to gemini-3-flash-preview via gemini-3-flash-base. When the Flash quota is exhausted (e.g. 100% usage), these internal utility calls fail even when the user switches to Pro or Flash Lite as their main model, making the CLI unusable. These utilities perform simple reasoning tasks that do not require Flash's full capabilities. Switch them to a new gemini-3-flash-lite-base that targets gemini-3.1-flash-lite-preview, which has a separate quota bucket and is well-suited for lightweight inference tasks. web-search and web-fetch remain on Flash because they rely on googleSearch and urlContext tool support which requires Flash. Fixes: utility_loop_detector, utility_tool, and next-speaker failures when gemini-3-flash-preview quota is exhausted.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical usability issue where internal utility tasks were hardcoded to use the standard Flash model, causing failures when that specific quota was exhausted. By migrating these lightweight utility tasks to a Flash Lite base configuration, the system now effectively manages quota consumption and prevents unnecessary service interruptions for users. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new model configuration, gemini-3-flash-lite-base, which utilizes the gemini-3.1-flash-lite-preview model. Additionally, it updates the loop-detection, llm-edit-fixer, and next-speaker-checker configurations to extend this new base instead of gemini-3-flash-base. I have no feedback to provide.
web-fetch-fallback extends gemini-3-flash-base but does not configure any tools (no urlContext), so it is just a plain model call. Flash Lite is equally capable and avoids consuming Flash quota.
gemini-3.1-flash-lite-preview officially supports both googleSearch (Grounding with Google Search) and urlContext tools per the Gemini API docs. Switching these configs to gemini-3-flash-lite-base reduces Flash quota consumption and keeps web tools functional when Flash is exhausted.
|
Ran |
Auto (Gemini 3) mode used Pro → Flash with Flash as isLastResort. When Flash quota is exhausted, the CLI had nowhere to fall back to. Add Flash Lite (gemini-3.1-flash-lite-preview when Gemini 3.1 is enabled, gemini-2.5-flash-lite otherwise) as the new isLastResort, demoting Flash to an intermediate step. Users no longer need to manually switch models when Flash is exhausted — Auto mode will silently continue on Flash Lite.
…n preview chain Flash and Flash Lite policies in the preview chain were using DEFAULT_ACTIONS (which prompts the user on quota exhaustion). This caused an unwanted dialog when Flash quota was hit during Auto (Gemini 3) mode. Use SILENT_ACTIONS for both Flash and Flash Lite so the fallback from Flash→Flash Lite happens automatically without user intervention, matching the behavior of FLASH_LITE_CHAIN.
Fixes #23397
Fixes #18059
Related to #24937 (capacity/429 tracking issue)
Problem
When
gemini-3-flash-previewquota is exhausted (100%), the CLI becomes completely unusable even if the user explicitly switches togemini-3.1-flash-lite-preview. The "Usage limit reached for gemini-3-flash-preview" error keeps firing regardless of the selected model.Root cause: all six internal utility configs are hardcoded to
gemini-3-flash-base→gemini-3-flash-preview, so they consume Flash quota independently of the user's model selection:loop-detectionUTILITY_LOOP_DETECTORllm-edit-fixerUTILITY_EDIT_CORRECTORnext-speaker-checkerUTILITY_NEXT_SPEAKERweb-fetch-fallbackweb-searchweb-fetchFix
Add
gemini-3-flash-lite-basetargetinggemini-3.1-flash-lite-previewand switch all six configs to use it.Why Flash Lite is safe:
edit-corrector,fast-ack-helper,summarizer-*,classifierwhich already use Flash Lite variantsweb-searchandweb-fetch:gemini-3.1-flash-lite-previewofficially supports bothgoogleSearch(Grounding) andurlContexttools per Gemini API docsImpact
Reproduction (from #23397)