fix(config): use flash-lite for utility model configs to preserve quota by kazukinakai · Pull Request #25684 · google-gemini/gemini-cli

kazukinakai · 2026-04-20T07:17:19Z

Fixes #23397
Fixes #18059
Related to #24937 (capacity/429 tracking issue)

Problem

When gemini-3-flash-preview quota is exhausted (100%), the CLI becomes completely unusable even if the user explicitly switches to gemini-3.1-flash-lite-preview. The "Usage limit reached for gemini-3-flash-preview" error keeps firing regardless of the selected model.

Root cause: all six internal utility configs are hardcoded to gemini-3-flash-base → gemini-3-flash-preview, so they consume Flash quota independently of the user's model selection:

Config key	Role	Before	After
`loop-detection`	`UTILITY_LOOP_DETECTOR`	Flash	Flash Lite ✓
`llm-edit-fixer`	`UTILITY_EDIT_CORRECTOR`	Flash	Flash Lite ✓
`next-speaker-checker`	`UTILITY_NEXT_SPEAKER`	Flash	Flash Lite ✓
`web-fetch-fallback`	fallback path (no tools)	Flash	Flash Lite ✓
`web-search`	Grounding with Google Search	Flash	Flash Lite ✓
`web-fetch`	URL context tool	Flash	Flash Lite ✓

Fix

Add gemini-3-flash-lite-base targeting gemini-3.1-flash-lite-preview and switch all six configs to use it.

Why Flash Lite is safe:

Utility tasks (loop detection, edit fixing, next-speaker routing): lightweight reasoning — same pattern as edit-corrector, fast-ack-helper, summarizer-*, classifier which already use Flash Lite variants
web-search and web-fetch: gemini-3.1-flash-lite-preview officially supports both googleSearch (Grounding) and urlContext tools per Gemini API docs

Impact

Users with exhausted Flash but available Flash Lite quota can keep working
Reduces Flash consumption for all utility calls, preserving quota for the main model
No functional regression

Reproduction (from #23397)

# Set Flash Lite as main model explicitly:
gemini -m 'gemini-3.1-flash-lite-preview'

# Still gets:
Usage limit reached for gemini-3-flash-preview.
/model to switch models.

Loop detection, LLM edit fixer, and next-speaker checker were hardcoded to gemini-3-flash-preview via gemini-3-flash-base. When the Flash quota is exhausted (e.g. 100% usage), these internal utility calls fail even when the user switches to Pro or Flash Lite as their main model, making the CLI unusable. These utilities perform simple reasoning tasks that do not require Flash's full capabilities. Switch them to a new gemini-3-flash-lite-base that targets gemini-3.1-flash-lite-preview, which has a separate quota bucket and is well-suited for lightweight inference tasks. web-search and web-fetch remain on Flash because they rely on googleSearch and urlContext tool support which requires Flash. Fixes: utility_loop_detector, utility_tool, and next-speaker failures when gemini-3-flash-preview quota is exhausted.

gemini-code-assist · 2026-04-20T07:17:27Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical usability issue where internal utility tasks were hardcoded to use the standard Flash model, causing failures when that specific quota was exhausted. By migrating these lightweight utility tasks to a Flash Lite base configuration, the system now effectively manages quota consumption and prevents unnecessary service interruptions for users.

Highlights

New Base Configuration: Introduced 'gemini-3-flash-lite-base' to target the 'gemini-3.1-flash-lite-preview' model.
Utility Model Migration: Updated 'loop-detection', 'llm-edit-fixer', and 'next-speaker-checker' to use the new Flash Lite base configuration.
Quota Management: Ensures utility tasks continue to function even when the primary Flash quota is exhausted by leveraging available Flash Lite capacity.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

google-cla · 2026-04-20T07:17:30Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist

Code Review

This pull request introduces a new model configuration, gemini-3-flash-lite-base, which utilizes the gemini-3.1-flash-lite-preview model. Additionally, it updates the loop-detection, llm-edit-fixer, and next-speaker-checker configurations to extend this new base instead of gemini-3-flash-base. I have no feedback to provide.

web-fetch-fallback extends gemini-3-flash-base but does not configure any tools (no urlContext), so it is just a plain model call. Flash Lite is equally capable and avoids consuming Flash quota.

gemini-3.1-flash-lite-preview officially supports both googleSearch (Grounding with Google Search) and urlContext tools per the Gemini API docs. Switching these configs to gemini-3-flash-lite-base reduces Flash quota consumption and keeps web tools functional when Flash is exhausted.

kazukinakai · 2026-04-20T07:44:38Z

Ran npm run preflight locally (Node 20, per .nvmrc): all checks passed (clean → npm ci → format → build → lint → typecheck → test).

Auto (Gemini 3) mode used Pro → Flash with Flash as isLastResort. When Flash quota is exhausted, the CLI had nowhere to fall back to. Add Flash Lite (gemini-3.1-flash-lite-preview when Gemini 3.1 is enabled, gemini-2.5-flash-lite otherwise) as the new isLastResort, demoting Flash to an intermediate step. Users no longer need to manually switch models when Flash is exhausted — Auto mode will silently continue on Flash Lite.

…n preview chain Flash and Flash Lite policies in the preview chain were using DEFAULT_ACTIONS (which prompts the user on quota exhaustion). This caused an unwanted dialog when Flash quota was hit during Auto (Gemini 3) mode. Use SILENT_ACTIONS for both Flash and Flash Lite so the fallback from Flash→Flash Lite happens automatically without user intervention, matching the behavior of FLASH_LITE_CHAIN.

kazukinakai requested a review from a team as a code owner April 20, 2026 07:17

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Apr 20, 2026

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

kazukinakai added 2 commits April 20, 2026 16:28

fix(config): also switch web-fetch-fallback to flash-lite base

dcd96ed

web-fetch-fallback extends gemini-3-flash-base but does not configure any tools (no urlContext), so it is just a plain model call. Flash Lite is equally capable and avoids consuming Flash quota.

This was referenced Apr 20, 2026

[BUG] Gemini CLI not respecting the set model #23397

Open

Tracking: 429 / Capacity Issues #24937

Open

gemini-cli bot added area/platform Issues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt and removed status/need-issue Pull requests that need to have an associated issue. labels Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): use flash-lite for utility model configs to preserve quota#25684

fix(config): use flash-lite for utility model configs to preserve quota#25684
kazukinakai wants to merge 5 commits intogoogle-gemini:mainfrom
kazukinakai:fix/utility-flash-lite-fallback

kazukinakai commented Apr 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 20, 2026

Uh oh!

google-cla bot commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

kazukinakai commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kazukinakai commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Impact

Reproduction (from #23397)

Uh oh!

gemini-code-assist bot commented Apr 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

google-cla bot commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

kazukinakai commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kazukinakai commented Apr 20, 2026 •

edited

Loading