Improve local inference settings and model downloader#8467
Improve local inference settings and model downloader#8467
Conversation
- Resume downloads after sleep/network interruption using HTTP Range headers with exponential backoff retry (up to 10 retries) - Fix incorrect sizes for sharded and multimodal models by filtering files based on model stem from repo name and aggregating shard sizes - Sort quantization variants by bit-level (highest quality first) with size as tiebreaker within the same rank - Show memory warnings for variants too large for the current system - Show already-downloaded variants with a disabled 'Downloaded' button - Remove direct download section and inner scroll area from search UI Signed-off-by: jh-block <jhugo@block.xyz>
Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d4d4dec05
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…on resume - Shard-only quantizations (e.g. BF16 split across multiple files) are no longer surfaced as downloadable variants, since resolve_model_spec cannot resolve them yet. A TODO marks where to re-enable once multi- file download is implemented. - When resuming a download with a Range header, if the server ignores it and returns 200 OK (full body) instead of 206 Partial Content, the partial file is now deleted and the download restarts from scratch, preventing silent corruption from appending a full copy. Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d07f4328e1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ui/desktop/src/components/settings/localInference/HuggingFaceModelSearch.tsx
Outdated
Show resolved
Hide resolved
downloadHfModel returns 202 when the server accepts the job, but the background transfer may still be in progress or could fail. Remove the premature addition to downloadedQuants so the UI stays accurate; the quant will only show as downloaded when getRepoFiles reports it on disk. Signed-off-by: jh-block <jhugo@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef76d77144
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fn is_model_file(filename: &str, model_stem_lower: &str) -> bool { | ||
| let basename = filename.rsplit('/').next().unwrap_or(filename); | ||
| basename.to_lowercase().starts_with(model_stem_lower) |
There was a problem hiding this comment.
Loosen model filename filtering beyond repo-stem prefix
is_model_file now requires each GGUF basename to start with the repo-derived stem, and both variant listing and spec resolution depend on that predicate. This makes discovery/download fail for any repo where filenames are valid quant files but use a different prefix convention than the repo slug (a common Hugging Face pattern), because group_into_variants and get_repo_gguf_files will filter out every candidate and report no matching quants.
Useful? React with 👍 / 👎.
| if !status.is_success() && status != reqwest::StatusCode::PARTIAL_CONTENT { | ||
| if retries >= Self::MAX_RETRIES { | ||
| anyhow::bail!("Failed to download: HTTP {}", status); |
There was a problem hiding this comment.
Avoid retrying permanent 4xx download failures
The retry block treats all non-success HTTP statuses as retryable, including permanent client errors like 401/403/404. In those cases the download will back off for multiple minutes before surfacing failure, which delays user feedback and ties up the download slot without any chance of recovery; retries should be limited to transient statuses (for example 408/429/5xx) and fail fast on persistent 4xx.
Useful? React with 👍 / 👎.
The local inference settings page and model downloader had several usability issues: downloads would silently fail after laptop sleep, some model variants showed incorrect sizes (notably sharded models like BF16), quantization variants were poorly sorted making it hard to compare options, and there was no indication in search results of which models were already downloaded or too large to run.
Changes:
mmproj-*.gguf. Sharded model files are aggregated to show the correct total size.