Skip to content

QVAC-18793: split LlamaFinetuner out of LlamaModel#1996

Open
jpgaribotti wants to merge 10 commits into
tetherto:mainfrom
jpgaribotti:llm-refactor-finetune
Open

QVAC-18793: split LlamaFinetuner out of LlamaModel#1996
jpgaribotti wants to merge 10 commits into
tetherto:mainfrom
jpgaribotti:llm-refactor-finetune

Conversation

@jpgaribotti
Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • LlamaModel had grown to own both the inference path and the entire LoRA finetune pipeline (pause/resume checkpoint state, training loop, dataset prep, optimizer/scheduler), which made lifetime and locking hard to reason about.
  • Finetune-only members (checkpointStateMutex_, currentCheckpointState_, pausedCheckpointState_) and ~20 private helpers cluttered LlamaModel.hpp even though they're only relevant on the finetune path.
  • A latent deprecation warning from llama_adapter_lora_free (deprecated in current llama.cpp: "adapters are now freed together with the associated model") was building noise into compiles.

📝 How does it solve it?

  • Introduces a dedicated LlamaFinetuner class that owns the finetune pipeline and pause/resume state. The class holds a reference back to its owning LlamaModel (composition; lifetime guaranteed by member ordering — declared last so it's destroyed first).
  • LlamaModel exposes the finetuner via finetuner() accessors; process() delegates the finetune dispatch to finetuner_.finetune(...). friend class LlamaFinetuner lets the new class access the bits of LlamaModel it still needs (stateMtx_, state_, metadata_, reload(), getContext(), getModel()) — intended to be removed once those access points get small accessor helpers.
  • FinetuneTerminalResult and the ProgressCallback alias move with the implementation. FinetuneConfigOverrides stays on LlamaModel because reload() / tuneConfigMap() on the inference path consume it.
  • Pure move otherwise: method bodies, locking, and the STANDALONE_TEST_BUILD guards are preserved.
  • Drops the now-deprecated llama_adapter_lora_free deleter on the resume path. Adapters are tied to model lifetime in current llama.cpp, and the surrounding reload(FinetuneConfigOverrides{}) calls on both the happy and error paths destroy + rebuild the model.
  • AddonJs.hpp updated to call through llamaModel->finetuner().{isFinetuneRunning,requestPause,waitUntilFinetuningPauseComplete}() and to use LlamaFinetuner::ProgressCallback.
  • CMakeLists.txt compiles LlamaFinetuner.cpp into both the addon and cli_tool targets.

🧪 How was it tested?

  • Existing finetune integration tests in packages/llm-llamacpp/test/integration/ exercise the moved code paths (train + pause/resume + adapter save). No test signatures changed.
  • C++ unit tests build with STANDALONE_TEST_BUILD; LlamaFinetuner.cpp is fully guarded by #ifndef STANDALONE_TEST_BUILD so the test build sees only the inline ctor/dtor (which is all LlamaModel.cpp references in that mode).

Move the LoRA finetune pipeline and pause/resume checkpoint state into
a dedicated `LlamaFinetuner` class that holds a reference to the owning
`LlamaModel`. `LlamaModel` exposes the finetuner via a `finetuner()`
accessor and delegates the in-`process()` finetune dispatch to it.
`FinetuneTerminalResult` and the `ProgressCallback` alias move with
the implementation; `FinetuneConfigOverrides` stays on `LlamaModel`
since it is also consumed by `reload()` / `tuneConfigMap()` on the
inference path.

Pure move: all method bodies, locking, and the `STANDALONE_TEST_BUILD`
guards are preserved. The split isolates finetune-only state from the
inference path, making lifetime/locking easier to reason about and
opening up follow-up simplifications (collapsible clear/pause helpers,
dropping the `friend` once cache-flush is exposed on `LlamaModel`).

Also drops the now-deprecated llama_adapter_lora_free deleter on the resume path — adapters are tied to the model lifetime in current llama.cpp, and the surrounding reload(FinetuneConfigOverrides{}) calls already destroy + rebuild the model on both happy and error paths.
@gianni-cor
Copy link
Copy Markdown
Contributor

Could you also update packages/llm-llamacpp/docs/finetuning.md as part of this refactor?

A few sections still describe the old ownership/API shape: LlamaModel::finetune(), LlamaModel::requestPause(), checkpoint state stored in LlamaModel, and requestPause() directly calling llama_opt_request_stop(ctx). With this PR, LlamaModel::process() delegates to finetuner_.finetune(...), pause/resume state lives on LlamaFinetuner, and requestPause() only sets pauseRequested; the optimizer stop happens later from the finetuning helper callback path.

This is docs-only, but the current text will be misleading for anyone debugging pause/resume or following the backend flow diagrams after the split.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

jpgaribotti and others added 5 commits May 13, 2026 19:22
Bump @qvac/llm-llamacpp from 0.20.0 to 0.21.0 and add the matching
CHANGELOG.md entry for PR tetherto#1996 (LlamaFinetuner refactor).

The 0.21.0 release is a pure internal C++ refactor of the addon: the
LoRA finetuning pipeline now lives in its own LlamaFinetuner class
instead of inside LlamaModel. No JS API change, no behaviour change.
Also drops the deprecated llama_adapter_lora_free deleter on the
resume path, eliminating the three -Wdeprecated-declarations warnings
called out in 0.20.0.
Bring the implementation notes, UML sequence diagrams, and component
tables in finetuning.md back in sync with the LlamaFinetuner refactor:
@gianni-cor
Copy link
Copy Markdown
Contributor

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants