QVAC-18793: split LlamaFinetuner out of LlamaModel#1996
Open
jpgaribotti wants to merge 10 commits into
Open
Conversation
Move the LoRA finetune pipeline and pause/resume checkpoint state into
a dedicated `LlamaFinetuner` class that holds a reference to the owning
`LlamaModel`. `LlamaModel` exposes the finetuner via a `finetuner()`
accessor and delegates the in-`process()` finetune dispatch to it.
`FinetuneTerminalResult` and the `ProgressCallback` alias move with
the implementation; `FinetuneConfigOverrides` stays on `LlamaModel`
since it is also consumed by `reload()` / `tuneConfigMap()` on the
inference path.
Pure move: all method bodies, locking, and the `STANDALONE_TEST_BUILD`
guards are preserved. The split isolates finetune-only state from the
inference path, making lifetime/locking easier to reason about and
opening up follow-up simplifications (collapsible clear/pause helpers,
dropping the `friend` once cache-flush is exposed on `LlamaModel`).
Also drops the now-deprecated llama_adapter_lora_free deleter on the resume path — adapters are tied to the model lifetime in current llama.cpp, and the surrounding reload(FinetuneConfigOverrides{}) calls already destroy + rebuild the model on both happy and error paths.
Contributor
|
Could you also update A few sections still describe the old ownership/API shape: This is docs-only, but the current text will be misleading for anyone debugging pause/resume or following the backend flow diagrams after the split. |
Tier-based Approval Status |
Bump @qvac/llm-llamacpp from 0.20.0 to 0.21.0 and add the matching CHANGELOG.md entry for PR tetherto#1996 (LlamaFinetuner refactor). The 0.21.0 release is a pure internal C++ refactor of the addon: the LoRA finetuning pipeline now lives in its own LlamaFinetuner class instead of inside LlamaModel. No JS API change, no behaviour change. Also drops the deprecated llama_adapter_lora_free deleter on the resume path, eliminating the three -Wdeprecated-declarations warnings called out in 0.20.0.
Bring the implementation notes, UML sequence diagrams, and component tables in finetuning.md back in sync with the LlamaFinetuner refactor:
gianni-cor
approved these changes
May 14, 2026
dev-nid
approved these changes
May 14, 2026
Contributor
|
/review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
LlamaModelhad grown to own both the inference path and the entire LoRA finetune pipeline (pause/resume checkpoint state, training loop, dataset prep, optimizer/scheduler), which made lifetime and locking hard to reason about.checkpointStateMutex_,currentCheckpointState_,pausedCheckpointState_) and ~20 private helpers clutteredLlamaModel.hppeven though they're only relevant on the finetune path.llama_adapter_lora_free(deprecated in current llama.cpp: "adapters are now freed together with the associated model") was building noise into compiles.📝 How does it solve it?
LlamaFinetunerclass that owns the finetune pipeline and pause/resume state. The class holds a reference back to its owningLlamaModel(composition; lifetime guaranteed by member ordering — declared last so it's destroyed first).LlamaModelexposes the finetuner viafinetuner()accessors;process()delegates the finetune dispatch tofinetuner_.finetune(...).friend class LlamaFinetunerlets the new class access the bits ofLlamaModelit still needs (stateMtx_,state_,metadata_,reload(),getContext(),getModel()) — intended to be removed once those access points get small accessor helpers.FinetuneTerminalResultand theProgressCallbackalias move with the implementation.FinetuneConfigOverridesstays onLlamaModelbecausereload()/tuneConfigMap()on the inference path consume it.STANDALONE_TEST_BUILDguards are preserved.llama_adapter_lora_freedeleter on the resume path. Adapters are tied to model lifetime in current llama.cpp, and the surroundingreload(FinetuneConfigOverrides{})calls on both the happy and error paths destroy + rebuild the model.AddonJs.hppupdated to call throughllamaModel->finetuner().{isFinetuneRunning,requestPause,waitUntilFinetuningPauseComplete}()and to useLlamaFinetuner::ProgressCallback.CMakeLists.txtcompilesLlamaFinetuner.cppinto both the addon andcli_tooltargets.🧪 How was it tested?
packages/llm-llamacpp/test/integration/exercise the moved code paths (train + pause/resume + adapter save). No test signatures changed.STANDALONE_TEST_BUILD;LlamaFinetuner.cppis fully guarded by#ifndef STANDALONE_TEST_BUILDso the test build sees only the inline ctor/dtor (which is allLlamaModel.cppreferences in that mode).