You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: RELEASE_V1.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@ structured-output path, which is already in the fork.)
59
59
```bash
60
60
# CPU host is fine for the converter; the build needs the target backend
61
61
# (Metal / CUDA / Vulkan / ...) — see packages/inference/AGENTS.md §8.
62
-
export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp # used by gguf_milady_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
62
+
export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp # used by gguf_eliza1_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
@@ -114,7 +114,7 @@ the vision mmproj on 9B+ (`vision/mmproj-<tier>.gguf`), and the embedding on
114
114
1.7B+ (`embedding/...gguf`). TTS/ASR/VAD are already GGUF/ONNX — just stage
115
115
the right quant (`omnivoice-base-<quant>.gguf` etc.).
116
116
117
-
**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_milady_apply.py` are
117
+
**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_eliza1_apply.py` are
118
118
CPU-only. They DO need the safetensors/checkpoint on disk and the fork
119
119
checkout.
120
120
@@ -124,7 +124,7 @@ checkout.
124
124
125
125
The five Milady quant recipes live in
126
126
`packages/training/scripts/quantization/`. PolarQuant produces the int8
127
-
weight codes that `gguf_milady_apply.py` packs as `Q4_POLAR` blocks;
127
+
weight codes that `gguf_eliza1_apply.py` packs as `Q4_POLAR` blocks;
128
128
TurboQuant + QJL are runtime KV-cache compressors — they emit the
129
129
`quantization/*.json` sidecars the fork's runtime quantizer consumes (with
130
130
the complete §3 `kernel_manifest` block: `kernel_target`,
@@ -329,7 +329,7 @@ and every platform-dispatch report is green for the exact shipped bytes, and
329
329
330
330
| Step | Host |
331
331
|---|---|
332
-
| Fork converter (`convert_hf_to_gguf.py`), `gguf_milady_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only`| CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
332
+
| Fork converter (`convert_hf_to_gguf.py`), `gguf_eliza1_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only`| CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
333
333
| Fork build with kernel patches, `metal_verify` / `vulkan_verify` / `cuda_verify` / `rocm_verify`, platform-dispatch smokes | the target backend's hardware (Metal Mac, CUDA NVIDIA, Vulkan Linux/Android, ROCm AMD; GH200-class aarch64+CUDA for the `27b-1m` tier) |
334
334
| PolarQuant code generation, TurboQuant skip-layer calibration, DFlash distillation, text perplexity / RTF / WER / VAD / dflash / e2e / 30-turn evals | a GPU big enough for the tier (consumer GPU for 0.6B/1.7B; ≥24 GB for 9B; ≥48 GB / multi-GPU for 27B) |
Copy file name to clipboardExpand all lines: docs/audits/lifeops-2026-05-11/eliza-1-status.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,7 +91,7 @@ helper:
91
91
92
92
1. Calls `read_eliza_one_bundle(bundle_path)` and aborts on any manifest schema violation.
93
93
2. Sets `MILADY_BENCH_PRE_RELEASE=1` when `bundle_is_pre_release(manifest)` is true. Aggregator picks this up and stamps the banner.
94
-
3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/milady-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
94
+
3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/eliza-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
95
95
4. Publishes `PARALLAX_OPENCODE_BASE_URL=http://127.0.0.1:18781/v1` so the OpenAI-compatible adapter finds the running server.
96
96
97
97
When the dflash binary is missing the harness exits with a hard error rather
0 commit comments