Skip to content

Improve local inference settings and model downloader#8467

Open
jh-block wants to merge 4 commits intomainfrom
jhugo/local-inference-settings-improvements
Open

Improve local inference settings and model downloader#8467
jh-block wants to merge 4 commits intomainfrom
jhugo/local-inference-settings-improvements

Conversation

@jh-block
Copy link
Copy Markdown
Collaborator

@jh-block jh-block commented Apr 10, 2026

The local inference settings page and model downloader had several usability issues: downloads would silently fail after laptop sleep, some model variants showed incorrect sizes (notably sharded models like BF16), quantization variants were poorly sorted making it hard to compare options, and there was no indication in search results of which models were already downloaded or too large to run.

Changes:

  • Download resilience: Downloads now resume after network interruption via HTTP Range headers with retry and exponential backoff, instead of silently failing and cleaning up the partial file.
  • Correct model sizes: File filtering now uses the model stem derived from the repo name (instead of including all GGUF files), which excludes auxiliary files like mmproj-*.gguf. Sharded model files are aggregated to show the correct total size.
  • Quantization sort order: Variants are grouped by bit-level (highest quality first) with proper rank assignments so all N-bit quants appear together.
  • Download state & memory warnings: Search results show which variants are already downloaded (disabled button) and warn when a model may not fit in available memory.
  • UI cleanup: Removed the direct download section and the nested scroll area in search results.

- Resume downloads after sleep/network interruption using HTTP Range
  headers with exponential backoff retry (up to 10 retries)
- Fix incorrect sizes for sharded and multimodal models by filtering
  files based on model stem from repo name and aggregating shard sizes
- Sort quantization variants by bit-level (highest quality first) with
  size as tiebreaker within the same rank
- Show memory warnings for variants too large for the current system
- Show already-downloaded variants with a disabled 'Downloaded' button
- Remove direct download section and inner scroll area from search UI

Signed-off-by: jh-block <jhugo@block.xyz>
Signed-off-by: jh-block <jhugo@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d4d4dec05

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…on resume

- Shard-only quantizations (e.g. BF16 split across multiple files) are
  no longer surfaced as downloadable variants, since resolve_model_spec
  cannot resolve them yet. A TODO marks where to re-enable once multi-
  file download is implemented.

- When resuming a download with a Range header, if the server ignores it
  and returns 200 OK (full body) instead of 206 Partial Content, the
  partial file is now deleted and the download restarts from scratch,
  preventing silent corruption from appending a full copy.

Signed-off-by: jh-block <jhugo@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d07f4328e1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

downloadHfModel returns 202 when the server accepts the job, but the
background transfer may still be in progress or could fail. Remove the
premature addition to downloadedQuants so the UI stays accurate;
the quant will only show as downloaded when getRepoFiles reports it
on disk.

Signed-off-by: jh-block <jhugo@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef76d77144

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +219 to +221
fn is_model_file(filename: &str, model_stem_lower: &str) -> bool {
let basename = filename.rsplit('/').next().unwrap_or(filename);
basename.to_lowercase().starts_with(model_stem_lower)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Loosen model filename filtering beyond repo-stem prefix

is_model_file now requires each GGUF basename to start with the repo-derived stem, and both variant listing and spec resolution depend on that predicate. This makes discovery/download fail for any repo where filenames are valid quant files but use a different prefix convention than the repo slug (a common Hugging Face pattern), because group_into_variants and get_repo_gguf_files will filter out every candidate and report no matching quants.

Useful? React with 👍 / 👎.

Comment on lines +299 to +301
if !status.is_success() && status != reqwest::StatusCode::PARTIAL_CONTENT {
if retries >= Self::MAX_RETRIES {
anyhow::bail!("Failed to download: HTTP {}", status);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid retrying permanent 4xx download failures

The retry block treats all non-success HTTP statuses as retryable, including permanent client errors like 401/403/404. In those cases the download will back off for multiple minutes before surfacing failure, which delays user feedback and ties up the download slot without any chance of recovery; retries should be limited to transient statuses (for example 408/429/5xx) and fail fast on persistent 4xx.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant