Improve local inference settings and model downloader by jh-block · Pull Request #8467 · aaif-goose/goose

jh-block · 2026-04-10T10:18:03Z

The local inference settings page and model downloader had several usability issues: downloads would silently fail after laptop sleep, some model variants showed incorrect sizes (notably sharded models like BF16), quantization variants were poorly sorted making it hard to compare options, and there was no indication in search results of which models were already downloaded or too large to run.

Changes:

Download resilience: Downloads now resume after network interruption via HTTP Range headers with retry and exponential backoff, instead of silently failing and cleaning up the partial file.
Correct model sizes: File filtering now uses the model stem derived from the repo name (instead of including all GGUF files), which excludes auxiliary files like mmproj-*.gguf. Sharded model files are aggregated to show the correct total size.
Quantization sort order: Variants are grouped by bit-level (highest quality first) with proper rank assignments so all N-bit quants appear together.
Download state & memory warnings: Search results show which variants are already downloaded (disabled button) and warn when a model may not fit in available memory.
UI cleanup: Removed the direct download section and the nested scroll area in search results.

- Resume downloads after sleep/network interruption using HTTP Range headers with exponential backoff retry (up to 10 retries) - Fix incorrect sizes for sharded and multimodal models by filtering files based on model stem from repo name and aggregating shard sizes - Sort quantization variants by bit-level (highest quality first) with size as tiebreaker within the same rank - Show memory warnings for variants too large for the current system - Show already-downloaded variants with a disabled 'Downloaded' button - Remove direct download section and inner scroll area from search UI Signed-off-by: jh-block <jhugo@block.xyz>

Signed-off-by: jh-block <jhugo@block.xyz>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d4d4dec05

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/src/providers/local_inference/hf_models.rs

crates/goose/src/download_manager.rs

…on resume - Shard-only quantizations (e.g. BF16 split across multiple files) are no longer surfaced as downloadable variants, since resolve_model_spec cannot resolve them yet. A TODO marks where to re-enable once multi- file download is implemented. - When resuming a download with a Range header, if the server ignores it and returns 200 OK (full body) instead of 206 Partial Content, the partial file is now deleted and the download restarts from scratch, preventing silent corruption from appending a full copy. Signed-off-by: jh-block <jhugo@block.xyz>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d07f4328e1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ui/desktop/src/components/settings/localInference/HuggingFaceModelSearch.tsx

downloadHfModel returns 202 when the server accepts the job, but the background transfer may still be in progress or could fail. Remove the premature addition to downloadedQuants so the UI stays accurate; the quant will only show as downloaded when getRepoFiles reports it on disk. Signed-off-by: jh-block <jhugo@block.xyz>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef76d77144

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T12:03:42Z

crates/goose/src/providers/local_inference/hf_models.rs

+fn is_model_file(filename: &str, model_stem_lower: &str) -> bool {
+    let basename = filename.rsplit('/').next().unwrap_or(filename);
+    basename.to_lowercase().starts_with(model_stem_lower)


Loosen model filename filtering beyond repo-stem prefix

is_model_file now requires each GGUF basename to start with the repo-derived stem, and both variant listing and spec resolution depend on that predicate. This makes discovery/download fail for any repo where filenames are valid quant files but use a different prefix convention than the repo slug (a common Hugging Face pattern), because group_into_variants and get_repo_gguf_files will filter out every candidate and report no matching quants.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T12:03:42Z

crates/goose/src/download_manager.rs

+            if !status.is_success() && status != reqwest::StatusCode::PARTIAL_CONTENT {
+                if retries >= Self::MAX_RETRIES {
+                    anyhow::bail!("Failed to download: HTTP {}", status);


Avoid retrying permanent 4xx download failures

The retry block treats all non-success HTTP statuses as retryable, including permanent client errors like 401/403/404. In those cases the download will back off for multiple minutes before surfacing failure, which delays user feedback and ties up the download slot without any chance of recovery; retries should be limited to transient statuses (for example 408/429/5xx) and fail fast on persistent 4xx.

Useful? React with 👍 / 👎.

jh-block added 2 commits April 10, 2026 12:17

Update i18n messages after search UI changes

bacf0c9

Signed-off-by: jh-block <jhugo@block.xyz>

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

crates/goose/src/providers/local_inference/hf_models.rs Outdated Show resolved Hide resolved

crates/goose/src/download_manager.rs Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

ui/desktop/src/components/settings/localInference/HuggingFaceModelSearch.tsx Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve local inference settings and model downloader#8467

Improve local inference settings and model downloader#8467
jh-block wants to merge 4 commits intomainfrom
jhugo/local-inference-settings-improvements

jh-block commented Apr 10, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jh-block commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jh-block commented Apr 10, 2026 •

edited

Loading