Skip to content

Commit d0bbb23

Browse files
lalaluneclaude
andcommitted
e2e-loop report: correct the DFlash-acceptance caveat to the real measured spread
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 parents bf0f5b7 + 3679c26 commit d0bbb23

16 files changed

Lines changed: 262 additions & 250 deletions

.gitmodules

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
[submodule "packages/inference/llama.cpp"]
2+
# The single canonical llama.cpp checkout for the whole repo. This is the
3+
# elizaOS/llama.cpp fork (@ v1.0.0-eliza, commit 08032d57): the unified
4+
# fork with the milady kernels (Q4_POLAR / QJL1_256 / TBQ4_0 / TBQ3_0
5+
# GGML types + Metal/Vulkan/CUDA kernels) and DFlash spec-decode. The
6+
# host build (build-llama-cpp-dflash.mjs) + AOSP cross-compile
7+
# (aosp/compile-libllama.mjs) default to this submodule; bun's postinstall
8+
# (scripts/ensure-llama-cpp-submodule.mjs) initializes it. The fork is
9+
# itself a llama.cpp fork, so it carries convert_hf_to_gguf.py /
10+
# llama-quantize / llama-cli too — the training pipeline's plain Q4_K_M
11+
# GGUF path uses the fork's tooling (there is no separate "stock upstream"
12+
# submodule). build/ is gitignored by llama.cpp's own .gitignore so only
13+
# the gitlink (commit SHA) is tracked.
214
path = packages/inference/llama.cpp
315
url = https://github.com/elizaOS/llama.cpp.git
416
branch = eliza/main
5-
[submodule "packages/training/vendor/llama.cpp"]
6-
# Stock upstream llama.cpp pinned to a release tag (b6650). Used by the
7-
# training pipeline's plain GGUF Q4_K_M path (convert_hf_to_gguf.py +
8-
# llama-quantize + llama-cli). The Milady fork — Q4_POLAR/QJL1_256/TBQ
9-
# GGML types — is the *other* submodule (packages/inference/llama.cpp).
10-
# scripts/vendor_llama_cpp.sh inits + builds this; build/ is gitignored
11-
# by llama.cpp's own .gitignore so only the gitlink (commit SHA) is tracked.
12-
path = packages/training/vendor/llama.cpp
13-
url = https://github.com/ggml-org/llama.cpp.git
14-
shallow = true

docs/porting/build-matrix.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,11 @@
33
> Per-cell status of every (platform, ABI, GPU-backend) combination
44
> that ships a Milady on-device runtime artifact. The unified fork
55
> ([`elizaOS/llama.cpp`](https://github.com/elizaOS/llama.cpp) @
6-
> `v0.3.0-milady`) is the authoritative source; per-cell artifacts
7-
> live under `~/.eliza/local-inference/bin/<target>/` for host
8-
> targets and under `apps/app/android/app/src/main/assets/agent/<abi>/`
9-
> for AOSP. See
6+
> `v1.0.0-eliza`, commit `08032d57`) is the authoritative source and
7+
> ships in-tree as the git submodule at `packages/inference/llama.cpp`;
8+
> per-cell artifacts live under `~/.eliza/local-inference/bin/<target>/`
9+
> for host targets and under
10+
> `apps/app/android/app/src/main/assets/agent/<abi>/` for AOSP. See
1011
> [`docs/porting/unified-fork-strategy.md`](./unified-fork-strategy.md)
1112
> for the per-technique branching scheme that produces these
1213
> artifacts,
@@ -35,11 +36,15 @@
3536

3637
The verification commands assume:
3738

38-
- `MILADY_LLAMA_CPP_REMOTE=https://github.com/elizaOS/llama.cpp` and
39-
`MILADY_LLAMA_CPP_REF=v0.3.0-milady` (the W3-B fused-CPU release).
40-
- `~/.cache/milady-llama-cpp/<commit>` is the canonical checkout
41-
cache used by `compile-libllama.mjs` (AOSP) and
42-
`build-llama-cpp-dflash.mjs` (host).
39+
- The fork checkout is the in-repo submodule `packages/inference/llama.cpp`
40+
(`elizaOS/llama.cpp @ v1.0.0-eliza`, commit `08032d57`) — `bun install`
41+
inits it via `scripts/ensure-llama-cpp-submodule.mjs`. Both build scripts
42+
(`compile-libllama.mjs` AOSP, `build-llama-cpp-dflash.mjs` host) default to
43+
it; `ELIZA_DFLASH_LLAMA_CPP_REMOTE` / `_REF` (or `--cache-dir` / `--src-dir`)
44+
force a standalone clone at `~/.cache/eliza-dflash/eliza-llama-cpp` instead.
45+
- Older example invocations below using a `~/.cache/...llama-cpp-v0.1.0`
46+
directory name are illustrative of a standalone-clone layout; the current
47+
default is the submodule path above.
4348

4449
Symbols listed under "Expected exported symbols" are the Milady-side
4550
additions on top of stock llama.cpp; the upstream `llama_*` /

docs/porting/dflash-drafter-strategy.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,10 @@ against the new checkpoint and re-stamped (training/AGENTS.md §2).
107107

108108
## DFlash vs. the Alternatives
109109

110-
The fork (`~/.cache/eliza-dflash/milady-llama-cpp`) exposes several
111-
speculative paths via `--spec-type`: `draft` (vanilla draft model),
110+
The fork (the in-repo submodule `packages/inference/llama.cpp`, or the
111+
standalone clone at `~/.cache/eliza-dflash/eliza-llama-cpp` when the build
112+
scripts' override forces one) exposes several speculative paths via
113+
`--spec-type`: `draft` (vanilla draft model),
112114
`dflash` (the spiritbuun-branded draft path — *functionally identical to
113115
`draft`*, it just preserves the AOSP CLI spelling; see
114116
`common/speculative.cpp`), `eagle3`, and a family of `ngram_*` paths

docs/porting/unified-fork-strategy.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,12 @@ retired. Stock desktop still runs `node-llama-cpp@3.18.1` — that's the remaini
2828
non-unified consumer; see §F for the migration plan. (`v1.0.0-eliza` is the same
2929
tree as the prior `v0.4.0-milady` / `v0.2.0-milady`-lineage tags, re-tagged on
3030
the elizaOS rename. A full rebase onto a recent upstream llama.cpp remains a
31-
follow-up — the conflict-prone surfaces are the quant-slot enums in
31+
deferred follow-up — **not** a blocker for any shipping feature; the b8198 base
32+
already carries `grammar_lazy` / `json_schema` / `response_format` /
33+
`prefill_assistant`. The conflict-prone surfaces are the quant-slot enums in
3234
`ggml-common.h` / `ggml.h` and upstream's incompatible redefinition of the
33-
`Q1_0` block layout.)
35+
`Q1_0` block layout — see [`upstream-rebase-plan.md`](./upstream-rebase-plan.md)
36+
for the full cost, conflict surface, trigger conditions, and sequencing.)
3437

3538
**Original problem (resolved for the AOSP+host paths, kept for context):**
3639
Milady previously built against three different llama.cpp trees and a
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Upstream rebase plan — `elizaOS/llama.cpp`
2+
3+
> The single page of record for "when do we rebase the fork onto a recent
4+
> upstream `ggml-org/llama.cpp`, and what does that cost." Pairs with
5+
> [`unified-fork-strategy.md`](./unified-fork-strategy.md) (which fixes the
6+
> repo / branching scheme) and [`on-device-quantization-porting-plan.md`](./on-device-quantization-porting-plan.md)
7+
> (per-technique deliverables). Read this before opening a rebase PR.
8+
9+
## TL;DR
10+
11+
- **Structured output is NOT blocked.** The fork at
12+
`elizaOS/llama.cpp @ v1.0.0-eliza` (commit `08032d57`, upstream base
13+
`b8198`, ~March 2026) **already carries** `grammar_lazy`, `json_schema`,
14+
`response_format`, and `prefill_assistant` in the split `tools/server/`
15+
files (`server-task.cpp` / `server-common.cpp` / `server-context.cpp` /
16+
`server-http.cpp`). The Eliza-1 structured-output path runs on the
17+
current pin — no rebase is required for it. Anything in older docs/comments
18+
saying "the fork must be rebased to get the structured-output features" or
19+
"the fork is based on old b8198 lacking grammar_lazy" is stale; it
20+
predates the `b8198`-based fork.
21+
- **A rebase onto current upstream IS still a real, deferred effort** — a
22+
multi-engineer job with mandatory GPU + Metal hardware verification.
23+
It is *not* on the critical path for any shipping Eliza-1 feature. It is
24+
worth doing only when (a) there is a concrete upstream feature we want
25+
(e.g. a newer quant kernel, a server fix), and (b) GPU/Metal runners are
26+
available to re-verify the TurboQuant Q1_0 path on upstream's new block
27+
layout.
28+
- The `v1.0.0-eliza` tag = the kernel-complete `v0.4.0-milady`/`v0.2.0-milady`
29+
lineage tree, re-tagged for the org rename. A real newer rebase produces a
30+
new `v1.x` tag.
31+
32+
## Why the rebase is hard: the `Q1_0` block collision
33+
34+
The fork composes TurboQuant onto a base where:
35+
36+
- the fork's `block_q1_0` uses `QK1_0 = 32` — the TurboQuant CUDA and Metal
37+
kernels (mmq / mmvq / vecdotq / the fused-attn path, plus the milady-kernels
38+
`.metal` shaders) are written against the 32-element block;
39+
- the fork's `block_q1_0_g128` (the 128-grouped variant) is approximately
40+
what upstream later shipped as its *new* `Q1_0`.
41+
42+
Upstream `b9106`+ redefined `block_q1_0` with `QK1_0 = 128`. So a rebase is
43+
not a clean replay — it is a re-port:
44+
45+
1. **Re-port TurboQuant's Q1_0 path onto upstream's 128-block design** and
46+
re-verify it on real GPU hardware (CUDA and Metal). The TurboQuant
47+
`mmq`/`mmvq`/`vecdotq` kernels and the Metal shaders all assume the
48+
32-element layout; moving to 128 changes tiling, packing, and the
49+
dequant inner loop. Bit-exact parity vs the reference + a model-backed
50+
graph smoke is the acceptance bar — and that requires `nvcc` + an NVIDIA
51+
card and an Apple-Silicon Metal box, neither of which CI has on free
52+
runners (see `unified-fork-strategy.md` §G).
53+
2. **Adapt to upstream's `ggml-metal` / `ggml-cuda` restructure.** Upstream
54+
has since split `ggml-metal.m``ggml-metal*.cpp` and reorganized the
55+
`ggml-cuda/` tree; the milady kernels live under
56+
`ggml/src/ggml-metal/milady-kernels/` and `ggml/src/ggml-cuda/{qjl,polarquant,turboquant,turbo-tcq}.cu(h)` and the fused-attn `.cu`, all of which have to be re-slotted into the
57+
new layout and re-wired into the dispatcher.
58+
59+
## Conflict surface (files that will fight you on rebase)
60+
61+
- `ggml/src/ggml-common.h`, `ggml/include/ggml.h` — the milady quant-slot
62+
enums (`TBQ3_0=43`, `TBQ4_0=44`, `QJL1_256=46`, `Q4_POLAR=47`) **and** the
63+
`block_q1_0` / `block_q1_0_g128` definitions vs upstream's redefined
64+
`Q1_0`. This is the central collision.
65+
- `ggml/src/ggml-quants.c`, `ggml/src/ggml-quants.h` — quantize/dequantize
66+
rows for every milady type + the Q1_0 reference path.
67+
- `ggml/src/ggml-cuda/{mmq,convert,vecdotq,mmvq,fattn*}.cu(h)` plus the
68+
milady CUDA kernels (`qjl.cu`, `polarquant.cu`, `turboquant.cu`,
69+
`turbo-tcq.cu`, the fused-attn `.cu`).
70+
- `ggml/src/ggml-metal/ggml-metal*.cpp` + `ggml/src/ggml-metal/ggml-metal.metal`
71+
and the `ggml-metal/milady-kernels/*.metal` shaders + dispatcher entries.
72+
- `gguf-py/gguf/constants.py` — the GGUF Python type table (`TBQ3_0`,
73+
`TBQ4_0`, `QJL1_256`, `Q4_POLAR`) the converter and the `gguf_milady_apply.py`
74+
shim grep for.
75+
- `include/llama.h` — re-exported types + `llama_context_params` (the
76+
`flash_attn` bool → `flash_attn_type` enum drift bites the AOSP shim).
77+
- `tools/quantize/quantize.cpp`, `src/llama-quant.cpp`,
78+
`src/llama-model-loader.cpp` — recognizing the new ftype names + loading
79+
the milady block layouts.
80+
- `tools/server/server-{task,common,context,http}.cpp` — the structured-output
81+
surface already ported once; an upstream rebase replays it against
82+
whatever upstream's server refactor looks like at that point. (Not a
83+
blocker — just more diff to reconcile.)
84+
85+
## When to do it
86+
87+
Trigger a rebase only when **both** are true:
88+
89+
1. There is a concrete upstream change we want pulled in (a quant kernel, a
90+
server fix, an MXFP4/NVFP4-class addition — see `unified-fork-strategy.md`
91+
§E item 1, the only "free on rebase" win), AND
92+
2. GPU + Metal verification capacity exists (a `cuda-l4` / `rocm-gfx1100` /
93+
`apple-m3-pro` runner, or a developer with the hardware) to re-verify the
94+
TurboQuant Q1_0 path on the new 128-block layout before merge.
95+
96+
Until then the `b8198`-based pin is the right answer: it carries every
97+
milady kernel, DFlash spec-decode, and the structured-output server surface,
98+
and it is hardware-verified at the levels recorded in
99+
`packages/inference/README.md`.
100+
101+
## Sequencing (when it happens)
102+
103+
1. New branch off `milady/main`; rebase onto the target upstream tag. Take
104+
the conflicts in the order of the surface list above (`ggml-common.h` /
105+
`ggml.h` first — resolving the `Q1_0` collision unblocks the rest).
106+
2. Re-port TurboQuant Q1_0 (CPU first, then CUDA, then Metal) onto upstream's
107+
128-block layout. CPU parity (scalar + AVX2 + NEON) is the gate before
108+
touching GPU.
109+
3. Re-slot the milady CUDA + Metal kernels into upstream's restructured
110+
`ggml-cuda/` and `ggml-metal/` trees; re-wire the dispatcher.
111+
4. Re-reconcile the structured-output server patch (or confirm upstream now
112+
carries it natively and drop our copy).
113+
5. Run the full CI matrix from `unified-fork-strategy.md` §G **plus** the
114+
`kernel-verify-gpu` job. No green-GPU run, no merge.
115+
6. Tag `v1.x` (the new kernel-complete rebased tree); bump
116+
`LLAMA_CPP_TAG`/`LLAMA_CPP_COMMIT`/`REF` in `build-llama-cpp-dflash.mjs`
117+
and `compile-libllama.mjs`, the `min_llama_cpp_tag` in the training
118+
manifest emitter, and `packages/inference/AGENTS.md` / this doc /
119+
`unified-fork-strategy.md`.
120+
121+
## See also
122+
123+
- [`unified-fork-strategy.md`](./unified-fork-strategy.md) §A (current
124+
state), §G (CI strategy), §H (migration order).
125+
- [`on-device-quantization-porting-plan.md`](./on-device-quantization-porting-plan.md)
126+
— per-technique × per-platform status.
127+
- [`packages/inference/AGENTS.md`](../../packages/inference/AGENTS.md) — the
128+
inference contract; the fork-source paragraph points here.
129+
- [`packages/inference/README.md`](../../packages/inference/README.md)
130+
the hardware-verification matrix that gates any kernel claim.

docs/training/optimization-pipeline.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,6 @@ consumers know the V-cache config falls back to the framework default.
106106

107107
```bash
108108
HF_TOKEN=hf_xxx \
109-
LLAMA_CPP_DIR=$HOME/src/milady-llama.cpp \
110109
uv run python scripts/optimize_for_milady.py \
111110
--base-model elizaos/eliza-1-lite-0_6b \
112111
--output-dir checkpoints/eliza-1-lite \
@@ -120,10 +119,14 @@ Production runs need:
120119

121120
- A GPU with CUDA for the TurboQuant calibration pass and (optionally)
122121
the QJL CUDA kernel build under `scripts/quantization/qjl/csrc/`.
123-
- A local checkout of `elizaOS/llama.cpp` at tag `v1.0.0-eliza` (the in-repo submodule `packages/inference/llama.cpp` already provides this)
124-
(commit `08032d57e15574f2a7ca19fc3f29510c8673d590`) at
125-
`$LLAMA_CPP_DIR`. The fork is the only place `convert_hf_to_gguf.py`
126-
understands `--outtype q4_polar`.
122+
- A checkout of `elizaOS/llama.cpp` at tag `v1.0.0-eliza` (commit
123+
`08032d57e15574f2a7ca19fc3f29510c8673d590`). The `packages/inference/llama.cpp`
124+
submodule already provides this (`bun install` inits it via
125+
`scripts/ensure-llama-cpp-submodule.mjs`), or a standalone clone at
126+
`~/.cache/eliza-dflash/eliza-llama-cpp` when the build scripts' override
127+
forces one; set `$LLAMA_CPP_DIR` to point at a different checkout. The
128+
fork is the only place `convert_hf_to_gguf.py` understands
129+
`--outtype q4_polar`.
127130
- An `HF_TOKEN` (or `HUGGINGFACE_HUB_TOKEN`) with write access to the
128131
`elizaos` HF org.
129132

packages/app-core/scripts/aosp/compile-libllama.mjs

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,16 @@
3434
//
3535
// llama.cpp pin (matches plugins/plugin-aosp-local-inference/src/aosp-llama-adapter.ts):
3636
// fork: https://github.com/elizaOS/llama.cpp
37-
// tag: v0.4.0-milady (milady/integration HEAD)
37+
// tag: v1.0.0-eliza (the kernel-complete v0.4.0-milady tree,
38+
// re-tagged on the elizaOS org rename)
3839
// commit: 08032d57e15574f2a7ca19fc3f29510c8673d590
3940
//
40-
// v0.4.0-milady adds W4-B CUDA QJL + PolarQuant Q4 + TBQ3_TCQ kernels
41-
// on top of v0.3.0-milady. The CUDA paths only matter for the
42-
// linux-x64-cuda host target (the AOSP arm64 path stays CPU-only),
43-
// but the pin is shared so both AOSP and host build paths land on
44-
// identical kernel sources.
41+
// This tree adds the W4-B CUDA QJL + PolarQuant Q4 + TBQ3_TCQ kernels
42+
// on top of the earlier milady-lineage tags. The CUDA paths only matter
43+
// for the linux-x64-cuda host target (the AOSP arm64 path stays
44+
// CPU-only), but the pin is shared so both AOSP and host build paths
45+
// land on identical kernel sources. A rebase onto a newer upstream is a
46+
// deferred effort — see docs/porting/upstream-rebase-plan.md.
4547
//
4648
// v0.2.0-milady (subset of this pin) added DFlash speculative decoding
4749
// CLI surface (--spec-type dflash, --draft-min-prob alias, n_drafted_total

packages/inference/AGENTS.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,12 @@ upstream b8198. Both build paths consume it: `build-llama-cpp-dflash.mjs`
2525
the submodule checkout. `ELIZA_DFLASH_LLAMA_CPP_REMOTE` / `_REF` (or `--cache-dir`
2626
/ `--src-dir`) still force a standalone clone for fork bisects. (`v1.0.0-eliza` is
2727
the same tree as the prior `v0.4.0-milady` tag, re-tagged on the elizaOS rename. A
28-
full rebase onto a recent upstream llama.cpp remains a follow-up — the
29-
conflict-prone files are the quant-slot enums in `ggml-common.h` / `ggml.h` and the
30-
`Q1_0` block layout, which upstream redefined incompatibly with the fork's.)
28+
full rebase onto a recent upstream llama.cpp remains a **deferred** follow-up — not
29+
a blocker for structured output (the b8198 base already has `grammar_lazy` /
30+
`json_schema` / `response_format` / `prefill_assistant`); the conflict-prone files
31+
are the quant-slot enums in `ggml-common.h` / `ggml.h` and the `Q1_0` block layout,
32+
which upstream redefined incompatibly with the fork's. Full cost / conflict surface
33+
/ trigger conditions: [`docs/porting/upstream-rebase-plan.md`](../../docs/porting/upstream-rebase-plan.md).)
3134

3235
---
3336

packages/inference/reports/porting/2026-05-11/e2e-loop-benchmark.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -222,9 +222,11 @@ targets).
222222
the generated text is off-topic (e.g. LaTeX) — the loop still exercises the
223223
full decode + DFlash + TTS path correctly; quality is a v2 (fine-tune)
224224
concern.
225-
- The DFlash drafter is a real GGUF but ≈ a copy of the target, so acceptance
226-
≈ 1.0; this is the right *shape* but not a meaningful acceptance number until
227-
a trained drafter ships.
225+
- The DFlash drafter is a real GGUF but ≈ a copy of the target. In the e2e
226+
bench (short n_predict, in-server DFlash loop) acceptance lands ~0.89–1.0; in
227+
the standalone `llama-speculative-simple` eval (`-n 48`, `--draft-min/max
228+
2/6`) it lands 0.87 (0.6B) / 0.55 (1.7B) — high-variance numbers off a
229+
near-copy drafter, the right *shape* but not a trained-drafter figure.
228230
- The ASR GGUF is stand-in quality → round-trip WER ≈ 1.0. Recorded honestly.
229231
- Server peak RSS exceeds the manifest budget on both tiers because the fused
230232
process keeps every voice region resident — this is a real publish blocker

packages/training/pyproject.toml

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -67,20 +67,20 @@ train = [
6767
# from ~2k to ~8k on the same 16 GB budget.
6868
"liger-kernel>=0.5.0",
6969
# GGUF Q4_K_M quantization (scripts/quantization/gguf-q4_k_m_apply.py).
70-
# The wrapper prefers a vendored stock llama.cpp checkout under
71-
# packages/training/vendor/llama.cpp (run scripts/vendor_llama_cpp.sh),
72-
# but the llama-cpp-python wheel also ships a usable `gguf` python module
73-
# so the HF→f16 GGUF convert step works without the vendored checkout.
74-
# The Q4_K_M *quantize* step still needs the `llama-quantize` binary the
75-
# vendor script builds. NOTE: the custom GGML types Q4_POLAR/QJL1_256/
76-
# TurboQuant are NOT in this wheel — those need the elizaOS/llama.cpp
77-
# fork via $LLAMA_CPP_DIR.
70+
# The wrapper uses the in-repo llama.cpp fork submodule at
71+
# packages/inference/llama.cpp (its convert_hf_to_gguf.py + a one-shot
72+
# CPU cmake build of llama-quantize/llama-cli — see the script's
73+
# _VENDOR_HINT), but the llama-cpp-python wheel also ships a usable
74+
# `gguf` python module so the HF→f16 GGUF convert step works even
75+
# without that build. The Q4_K_M *quantize* step still needs a real
76+
# `llama-quantize` binary. NOTE: the custom GGML types Q4_POLAR/QJL1_256/
77+
# TurboQuant are only in the fork — same submodule, or $LLAMA_CPP_DIR.
7878
"llama-cpp-python>=0.3.0",
79-
# convert_hf_to_gguf.py (vendored stock llama.cpp) imports `gguf` AND
80-
# `mistral_common` at module load — both must be installed or the
81-
# HF→GGUF convert step dies on import. The vendor script also installs
82-
# these from requirements/requirements-convert_hf_to_gguf.txt, but pin
83-
# them here so `uv run --extra train` works standalone.
79+
# convert_hf_to_gguf.py imports `gguf` AND `mistral_common` at module
80+
# load — both must be installed or the HF→GGUF convert step dies on
81+
# import. The fork's requirements/requirements-convert_hf_to_gguf.txt
82+
# also installs these, but pin them here so `uv run --extra train`
83+
# works standalone.
8484
"gguf>=0.10.0",
8585
"mistral_common>=1.8.3",
8686
# pytest is consumed by the pre-flight gate (scripts/preflight.sh) and

0 commit comments

Comments
 (0)