Skip to content

fix: resolve shared-repo GGUF variants orphaned by refs/main advance#2311

Open
ianbmacdonald wants to merge 1 commit into
lemonade-sdk:mainfrom
ianbmacdonald:fix/shared-repo-variant-resolution
Open

fix: resolve shared-repo GGUF variants orphaned by refs/main advance#2311
ianbmacdonald wants to merge 1 commit into
lemonade-sdk:mainfrom
ianbmacdonald:fix/shared-repo-variant-resolution

Conversation

@ianbmacdonald

@ianbmacdonald ianbmacdonald commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Problem

refs/main in a Hugging Face repo cache is a single sticky pointer (advanced only on a successful pull). When two models share one HF repo with different quants, pulling or updating one variant advances refs/main to a snapshot that contains only that variant — the sibling variant's file stays behind in the previous snapshot.

The llamacpp main-GGUF resolver (ModelManager::resolve_model_path) collected candidate GGUFs from the refs/main snapshot only, with a whole-cache fallback that fired solely when that snapshot had zero GGUFs. So after the next models-cache build (e.g. a lemond restart), the sibling variant — present, but not under refs/main — was reported as not downloaded even though its file is still cached. Fixes #2300.

Fix

Factor the variant-matching cases into a resolve_gguf_variant lambda. Try the active refs/main snapshot first; only if that misses, broaden the search to every snapshot in the repo cache before declaring the variant missing.

  • Active snapshot first preserves the CHECKPOINT:VARIANT contract — a different quant is never substituted while the exact one exists under refs/main.
  • Blobs are content-addressed and shared across snapshots, so reading an older snapshot's copy is safe.
  • The whole-cache walk is lazy (only runs on a miss), so the hot path is unchanged.
  • The auxiliary-checkpoint resolver already had this all-snapshots fallback; this brings the main GGUF path in line.

The C++ diff is mostly reindentation of the existing cases into the lambda — each case's matching behaviour is unchanged.

Test

test_034 (server_endpoints.py): pulls two quants sharing a repo, moves one into a fresh snapshot and advances refs/main, forces a models-cache rebuild (via an unrelated throwaway pull — re-pulling a shared-repo model would query HF and repair refs/main, masking the bug), then asserts both variants stay resolvable. Fails on main, passes with this change.

Validation

Exercised on real hardware:

  • lemonade: 10.8.0 (imac-built dev .deb carrying this branch; not a release)
  • OS / kernel: Ubuntu 26.04 LTS · 7.0.0-22-generic · glibc 2.43 · amd64
  • GPU: AMD Radeon RX 7900 XT (Navi 31 [1002:744c], gfx1100, RDNA3, 24 GB)
  • Backend: none — test_034 and test_030test_033 exercise GGUF cache/path resolution and the pull-variants API; no model is loaded, so no inference backend or ROCm runtime is invoked.

test_034 fails on an unpatched build and passes with this change; the existing test_030test_033 pull-variant tests remain green.

@github-actions github-actions Bot added the bug Something isn't working label Jun 19, 2026
@ianbmacdonald

Copy link
Copy Markdown
Collaborator Author

CI is green except for Build Lemonade macOS .dmg (with Tauri App), which fails in its whisper Metal inference tests (unsigned path) step (test_001_transcription_basic). That's unrelated to this change (a GGUF cache-path resolver fix — no whisper/Metal/inference touched), and the same job is currently failing on main, so it looks like a pre-existing macOS-Metal issue rather than anything introduced here. The macOS C++ build (Build Embeddable Lemonade (macOS)) and all other build/test jobs pass.

@ianbmacdonald ianbmacdonald force-pushed the fix/shared-repo-variant-resolution branch from 8babca1 to 7c480fe Compare June 19, 2026 23:31
When two models share one Hugging Face repo with different quants, pulling or
updating one advances the repo's single refs/main pointer to a snapshot that
contains only that variant. The sibling variant's file stays in the previous
snapshot, so after the next models-cache build (e.g. a lemond restart) the
llamacpp GGUF resolver — which searches only the refs/main snapshot — reports the
sibling as not downloaded even though its file is still cached (lemonade-sdk#2300).

Factor the variant-matching cases into a lambda and, when the active refs/main
snapshot does not contain the requested variant, retry the search across every
snapshot in the repo cache before declaring it missing. The active snapshot is
still searched first, so the CHECKPOINT:VARIANT contract is preserved (a
different quant is never substituted while the exact one exists), and blobs are
content-addressed so reading an older snapshot is safe. The auxiliary-checkpoint
resolver already had this all-snapshots fallback; this brings the main GGUF path
in line.

Adds test_034 (server_endpoints.py): pulls two quants sharing a repo, advances
refs/main to a snapshot holding only one, forces a cache rebuild, and asserts the
orphaned variant still resolves.

Closes lemonade-sdk#2300

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: GLM-5.2 <noreply@zhipuai.cn>
Co-Authored-By: GPT-5.5 <noreply@openai.com>
@ianbmacdonald ianbmacdonald force-pushed the fix/shared-repo-variant-resolution branch from 7c480fe to 6e02c13 Compare June 20, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updating one model variant breaks other variants sharing the same HF repo

2 participants