fix: resolve shared-repo GGUF variants orphaned by refs/main advance#2311
Open
ianbmacdonald wants to merge 1 commit into
Open
fix: resolve shared-repo GGUF variants orphaned by refs/main advance#2311ianbmacdonald wants to merge 1 commit into
ianbmacdonald wants to merge 1 commit into
Conversation
Collaborator
Author
|
CI is green except for Build Lemonade macOS .dmg (with Tauri App), which fails in its |
8babca1 to
7c480fe
Compare
When two models share one Hugging Face repo with different quants, pulling or updating one advances the repo's single refs/main pointer to a snapshot that contains only that variant. The sibling variant's file stays in the previous snapshot, so after the next models-cache build (e.g. a lemond restart) the llamacpp GGUF resolver — which searches only the refs/main snapshot — reports the sibling as not downloaded even though its file is still cached (lemonade-sdk#2300). Factor the variant-matching cases into a lambda and, when the active refs/main snapshot does not contain the requested variant, retry the search across every snapshot in the repo cache before declaring it missing. The active snapshot is still searched first, so the CHECKPOINT:VARIANT contract is preserved (a different quant is never substituted while the exact one exists), and blobs are content-addressed so reading an older snapshot is safe. The auxiliary-checkpoint resolver already had this all-snapshots fallback; this brings the main GGUF path in line. Adds test_034 (server_endpoints.py): pulls two quants sharing a repo, advances refs/main to a snapshot holding only one, forces a cache rebuild, and asserts the orphaned variant still resolves. Closes lemonade-sdk#2300 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: GLM-5.2 <noreply@zhipuai.cn> Co-Authored-By: GPT-5.5 <noreply@openai.com>
7c480fe to
6e02c13
Compare
bitgamma
approved these changes
Jun 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
refs/mainin a Hugging Face repo cache is a single sticky pointer (advanced only on a successful pull). When two models share one HF repo with different quants, pulling or updating one variant advancesrefs/mainto a snapshot that contains only that variant — the sibling variant's file stays behind in the previous snapshot.The llamacpp main-GGUF resolver (
ModelManager::resolve_model_path) collected candidate GGUFs from therefs/mainsnapshot only, with a whole-cache fallback that fired solely when that snapshot had zero GGUFs. So after the next models-cache build (e.g. alemondrestart), the sibling variant — present, but not underrefs/main— was reported as not downloaded even though its file is still cached. Fixes #2300.Fix
Factor the variant-matching cases into a
resolve_gguf_variantlambda. Try the activerefs/mainsnapshot first; only if that misses, broaden the search to every snapshot in the repo cache before declaring the variant missing.CHECKPOINT:VARIANTcontract — a different quant is never substituted while the exact one exists underrefs/main.The C++ diff is mostly reindentation of the existing cases into the lambda — each case's matching behaviour is unchanged.
Test
test_034(server_endpoints.py): pulls two quants sharing a repo, moves one into a fresh snapshot and advancesrefs/main, forces a models-cache rebuild (via an unrelated throwaway pull — re-pulling a shared-repo model would query HF and repairrefs/main, masking the bug), then asserts both variants stay resolvable. Fails onmain, passes with this change.Validation
Exercised on real hardware:
.debcarrying this branch; not a release)7.0.0-22-generic· glibc 2.43 · amd64[1002:744c], gfx1100, RDNA3, 24 GB)test_034andtest_030–test_033exercise GGUF cache/path resolution and the pull-variants API; no model is loaded, so no inference backend or ROCm runtime is invoked.test_034fails on an unpatched build and passes with this change; the existingtest_030–test_033pull-variant tests remain green.