Skip to content

Commit 97a7687

Browse files
author
lalalune
committed
chore: reconcile cross-references after CR-1..CR-5 consolidation merges (eliza1 script renames, v1.0.0-eliza tag, eliza-llama-cpp cache path, drop deleted robot-voice doc)
1 parent c25b1fb commit 97a7687

32 files changed

Lines changed: 56 additions & 91 deletions

.github/workflows/lifeops-bench-multi-tier.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ jobs:
295295
id: dflash-cache
296296
uses: actions/cache@v4
297297
with:
298-
path: ~/.cache/eliza-dflash/milady-llama-cpp/build/bin/llama-server
298+
path: ~/.cache/eliza-dflash/eliza-llama-cpp/build/bin/llama-server
299299
key: dflash-llama-cpp-${{ runner.os }}-v1
300300

301301
- name: Build dflash fork (best-effort)

.github/workflows/local-inference-bench.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ jobs:
379379
380380
- name: Cross-build windows-x64-cpu
381381
env:
382-
# Override the dflash fork to elizaOS/llama.cpp v0.1.0-milady so
382+
# Override the dflash fork to elizaOS/llama.cpp v1.0.0-eliza so
383383
# CI exercises the symbols downstream consumers actually expect.
384384
# Operators that want to test against the spiritbuun upstream can
385385
# leave this unset.
@@ -388,7 +388,7 @@ jobs:
388388
set -euo pipefail
389389
node packages/app-core/scripts/build-llama-cpp-dflash.mjs \
390390
--target windows-x64-cpu \
391-
--ref v0.1.0-milady \
391+
--ref v1.0.0-eliza \
392392
--cache-dir "$RUNNER_TEMP/llama-cpp-cross-cache" \
393393
--out-dir "$GITHUB_WORKSPACE/build-output/windows-x64-cpu"
394394

ELIZA_1_RELEASE_ASSET_STATUS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ component.
9797
build clean and self-test on x86_64 here.
9898
- The training/manifest/publish machinery: the quant recipes
9999
(`packages/training/scripts/quantization/`), the converter wrapper
100-
(`gguf_milady_apply.py`, `--release-state base-v1`), the DFlash distiller
100+
(`gguf_eliza1_apply.py`, `--release-state base-v1`), the DFlash distiller
101101
(`distill_dflash_drafter.py`, `--synthetic-smoke` runs offline), the bundle
102102
stagers (`packages/training/scripts/manifest/stage_*.py`), the manifest builder
103103
(`eliza1_manifest.py`), the platform-plan generator (`eliza1_platform_plan.py`),

ELIZA_1_VOICE_SWARM.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
to ASR / openWakeWord, DFlash↔TTS rollback coupling, barge-in cancellation,
4343
voice on/off lazy regional loading from one bundle.
4444
- **Release pipeline.** Quant recipes, the converter wrapper
45-
(`gguf_milady_apply.py`, `--release-state base-v1`), the DFlash distiller, the
45+
(`gguf_eliza1_apply.py`, `--release-state base-v1`), the DFlash distiller, the
4646
bundle stagers, the manifest builder, the platform-plan generator, the publish
4747
orchestrator (gates on `releaseState ∈ {base-v1, upload-candidate, final}` +
4848
the `final.*` flags + `finetuned=false` + the `sourceModels` map), the §7

RELEASE_V1.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ structured-output path, which is already in the fork.)
5959
```bash
6060
# CPU host is fine for the converter; the build needs the target backend
6161
# (Metal / CUDA / Vulkan / ...) — see packages/inference/AGENTS.md §8.
62-
export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp # used by gguf_milady_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
62+
export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp # used by gguf_eliza1_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
6363
node packages/app-core/scripts/build-llama-cpp-dflash.mjs # kernel patches + build (per supported backend)
6464
make -C packages/inference/verify reference-test # CPU host: must be clean
6565
```
@@ -97,7 +97,7 @@ uv run python packages/inference/llama.cpp/convert_hf_to_gguf.py <hf-checkpoint-
9797
--outtype q4_k_m --outfile out/eliza-1-9b/text/eliza-1-9b-64k.gguf
9898

9999
# Or, with the Milady type wrapper + provenance recording (CPU-safe, idempotent):
100-
uv run python packages/training/scripts/quantization/gguf_milady_apply.py \
100+
uv run python packages/training/scripts/quantization/gguf_eliza1_apply.py \
101101
--checkpoint <hf-checkpoint-dir-with-polarquant-codes> \
102102
--output out/eliza-1-9b/text/eliza-1-9b-64k.gguf \
103103
--llama-cpp-dir packages/inference/llama.cpp \
@@ -114,7 +114,7 @@ the vision mmproj on 9B+ (`vision/mmproj-<tier>.gguf`), and the embedding on
114114
1.7B+ (`embedding/...gguf`). TTS/ASR/VAD are already GGUF/ONNX — just stage
115115
the right quant (`omnivoice-base-<quant>.gguf` etc.).
116116

117-
**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_milady_apply.py` are
117+
**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_eliza1_apply.py` are
118118
CPU-only. They DO need the safetensors/checkpoint on disk and the fork
119119
checkout.
120120

@@ -124,7 +124,7 @@ checkout.
124124

125125
The five Milady quant recipes live in
126126
`packages/training/scripts/quantization/`. PolarQuant produces the int8
127-
weight codes that `gguf_milady_apply.py` packs as `Q4_POLAR` blocks;
127+
weight codes that `gguf_eliza1_apply.py` packs as `Q4_POLAR` blocks;
128128
TurboQuant + QJL are runtime KV-cache compressors — they emit the
129129
`quantization/*.json` sidecars the fork's runtime quantizer consumes (with
130130
the complete §3 `kernel_manifest` block: `kernel_target`,
@@ -329,7 +329,7 @@ and every platform-dispatch report is green for the exact shipped bytes, and
329329

330330
| Step | Host |
331331
|---|---|
332-
| Fork converter (`convert_hf_to_gguf.py`), `gguf_milady_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only` | CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
332+
| Fork converter (`convert_hf_to_gguf.py`), `gguf_eliza1_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only` | CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
333333
| Fork build with kernel patches, `metal_verify` / `vulkan_verify` / `cuda_verify` / `rocm_verify`, platform-dispatch smokes | the target backend's hardware (Metal Mac, CUDA NVIDIA, Vulkan Linux/Android, ROCm AMD; GH200-class aarch64+CUDA for the `27b-1m` tier) |
334334
| PolarQuant code generation, TurboQuant skip-layer calibration, DFlash distillation, text perplexity / RTF / WER / VAD / dflash / e2e / 30-turn evals | a GPU big enough for the tier (consumer GPU for 0.6B/1.7B; ≥24 GB for 9B; ≥48 GB / multi-GPU for 27B) |
335335

docs/audits/lifeops-2026-05-11/eliza-1-status.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ helper:
9191

9292
1. Calls `read_eliza_one_bundle(bundle_path)` and aborts on any manifest schema violation.
9393
2. Sets `MILADY_BENCH_PRE_RELEASE=1` when `bundle_is_pre_release(manifest)` is true. Aggregator picks this up and stamps the banner.
94-
3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/milady-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
94+
3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/eliza-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
9595
4. Publishes `PARALLAX_OPENCODE_BASE_URL=http://127.0.0.1:18781/v1` so the OpenAI-compatible adapter finds the running server.
9696

9797
When the dflash binary is missing the harness exits with a hard error rather

docs/porting/upstream-rebase-plan.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ not a clean replay — it is a re-port:
7070
- `ggml/src/ggml-metal/ggml-metal*.cpp` + `ggml/src/ggml-metal/ggml-metal.metal`
7171
and the `ggml-metal/milady-kernels/*.metal` shaders + dispatcher entries.
7272
- `gguf-py/gguf/constants.py` — the GGUF Python type table (`TBQ3_0`,
73-
`TBQ4_0`, `QJL1_256`, `Q4_POLAR`) the converter and the `gguf_milady_apply.py`
73+
`TBQ4_0`, `QJL1_256`, `Q4_POLAR`) the converter and the `gguf_eliza1_apply.py`
7474
shim grep for.
7575
- `include/llama.h` — re-exported types + `llama_context_params` (the
7676
`flash_attn` bool → `flash_attn_type` enum drift bites the AOSP shim).

docs/training/optimization-pipeline.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ spec-decode CLI surface:
2424

2525
Each technique has a research-grade Python apply script under
2626
`packages/training/scripts/quantization/`. The orchestrator at
27-
`packages/training/scripts/optimize_for_milady.py` is the single entry
27+
`packages/training/scripts/optimize_for_eliza1.py` is the single entry
2828
point that runs them in dependency order, drives the GGUF conversion
2929
with the fork's `convert_hf_to_gguf.py`, and emits a runtime manifest
3030
that the on-device downloader consumes.
@@ -49,15 +49,15 @@ Source: `packages/training/scripts/quantization/README.md` and
4949

5050
```
5151
packages/training/scripts/
52-
optimize_for_milady.py ← master orchestrator (this doc)
53-
emit_milady_catalog.py ← catalog.ts diff generator
52+
optimize_for_eliza1.py ← master orchestrator (this doc)
53+
emit_eliza1_catalog.py ← catalog.ts diff generator
5454
push_model_to_hf.py ← HF publisher (extended with --milady-manifest)
5555
quantization/
5656
polarquant_apply.py ← PolarQuant 4-bit weights
5757
qjl_apply.py ← QJL 1-bit K cache
5858
turboquant_apply.py ← TurboQuant V cache (PyTorch reference)
5959
fused_turboquant_apply.py ← TurboQuant V cache (Triton kernel; needs GPU)
60-
gguf_milady_apply.py ← GGUF emit shim for Milady GGML types
60+
gguf_eliza1_apply.py ← GGUF emit shim for Milady GGML types
6161
```
6262

6363
### Apply order
@@ -88,7 +88,7 @@ HuggingFace: elizaos/eliza-1-<tier>
8888

8989
```bash
9090
cd packages/training
91-
uv run python scripts/optimize_for_milady.py \
91+
uv run python scripts/optimize_for_eliza1.py \
9292
--base-model elizaos/eliza-1-lite-0_6b \
9393
--output-dir checkpoints/eliza-1-lite \
9494
--apply polarquant qjl turboquant \
@@ -106,7 +106,7 @@ consumers know the V-cache config falls back to the framework default.
106106

107107
```bash
108108
HF_TOKEN=hf_xxx \
109-
uv run python scripts/optimize_for_milady.py \
109+
uv run python scripts/optimize_for_eliza1.py \
110110
--base-model elizaos/eliza-1-lite-0_6b \
111111
--output-dir checkpoints/eliza-1-lite \
112112
--apply polarquant qjl turboquant \
@@ -135,7 +135,7 @@ Production runs need:
135135
After publish, run:
136136

137137
```bash
138-
uv run python scripts/emit_milady_catalog.py \
138+
uv run python scripts/emit_eliza1_catalog.py \
139139
--manifest checkpoints/eliza-1-lite/gguf/milady_manifest.json \
140140
--catalog ../app-core/src/services/local-inference/catalog.ts \
141141
--output reports/training/catalog-eliza-1-lite.diff
@@ -185,7 +185,7 @@ inference call.
185185
## Verified outputs (dry-run on Eliza-1 lite)
186186

187187
```
188-
$ uv run python scripts/optimize_for_milady.py \
188+
$ uv run python scripts/optimize_for_eliza1.py \
189189
--base-model elizaos/eliza-1-lite-0_6b \
190190
--output-dir /tmp/eliza-1-lite-test \
191191
--apply polarquant qjl turboquant \
@@ -252,7 +252,7 @@ TurboQuant calibration produces a real `skip_layers` profile.
252252

253253
## Out of scope for this pipeline
254254

255-
- Catalog purge (W5-Catalog) — emit_milady_catalog.py only emits the
255+
- Catalog purge (W5-Catalog) — emit_eliza1_catalog.py only emits the
256256
append diff; it does not delete other catalog entries.
257257
- HF org provisioning (W5-HF-Org) — this script assumes the
258258
`elizaos` org exists and the operator's `HF_TOKEN` has write

packages/app-core/scripts/kernel-patches/cpu-polar-kernels.mjs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
// PolarQuant pre-Hadamard-transposed (`_preht`) CPU dot wiring for the
2-
// v0.4.0-milady fork.
2+
// elizaOS/llama.cpp fork (v1.0.0-eliza).
33
//
44
// What this module does (applied after `git reset --hard` on the cached
55
// fork checkout, every build):

packages/app-core/scripts/kernel-patches/cpu-simd-kernels.mjs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// CPU SIMD kernel staging for the v0.4.0-milady fork (Wave A1 wiring).
1+
// CPU SIMD kernel staging for the elizaOS/llama.cpp fork (v1.0.0-eliza) (Wave A1 wiring).
22
//
33
// What this module does:
44
//

0 commit comments

Comments
 (0)