elizaOS
diff --git a/‎.github/workflows/lifeops-bench-multi-tier.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/lifeops-bench-multi-tier.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/local-inference-bench.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/local-inference-bench.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎ELIZA_1_RELEASE_ASSET_STATUS.md‎
Lines changed: 1 addition & 1 deletion b/‎ELIZA_1_RELEASE_ASSET_STATUS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎ELIZA_1_VOICE_SWARM.md‎
Lines changed: 1 addition & 1 deletion b/‎ELIZA_1_VOICE_SWARM.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎RELEASE_V1.md‎
Lines changed: 5 additions & 5 deletions b/‎RELEASE_V1.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/audits/lifeops-2026-05-11/eliza-1-status.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/audits/lifeops-2026-05-11/eliza-1-status.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/porting/upstream-rebase-plan.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/porting/upstream-rebase-plan.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/training/optimization-pipeline.md‎
Lines changed: 9 additions & 9 deletions b/‎docs/training/optimization-pipeline.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎packages/app-core/scripts/kernel-patches/cpu-polar-kernels.mjs‎
Lines changed: 1 addition & 1 deletion b/‎packages/app-core/scripts/kernel-patches/cpu-polar-kernels.mjs‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎packages/app-core/scripts/kernel-patches/cpu-simd-kernels.mjs‎
Lines changed: 1 addition & 1 deletion b/‎packages/app-core/scripts/kernel-patches/cpu-simd-kernels.mjs‎
Lines changed: 1 addition & 1 deletion
@@ -295,7 +295,7 @@ jobs:
         id: dflash-cache
         uses: actions/cache@v4
         with:
-          path: ~/.cache/eliza-dflash/milady-llama-cpp/build/bin/llama-server
+          path: ~/.cache/eliza-dflash/eliza-llama-cpp/build/bin/llama-server
           key: dflash-llama-cpp-${{ runner.os }}-v1
 
       - name: Build dflash fork (best-effort)
 
@@ -379,7 +379,7 @@ jobs:
 
       - name: Cross-build windows-x64-cpu
         env:
-          # Override the dflash fork to elizaOS/llama.cpp v0.1.0-milady so
+          # Override the dflash fork to elizaOS/llama.cpp v1.0.0-eliza so
           # CI exercises the symbols downstream consumers actually expect.
           # Operators that want to test against the spiritbuun upstream can
           # leave this unset.
@@ -388,7 +388,7 @@ jobs:
           set -euo pipefail
           node packages/app-core/scripts/build-llama-cpp-dflash.mjs \
             --target windows-x64-cpu \
-            --ref v0.1.0-milady \
+            --ref v1.0.0-eliza \
             --cache-dir "$RUNNER_TEMP/llama-cpp-cross-cache" \
             --out-dir "$GITHUB_WORKSPACE/build-output/windows-x64-cpu"
 
 
@@ -97,7 +97,7 @@ component.
   build clean and self-test on x86_64 here.
 - The training/manifest/publish machinery: the quant recipes
   (`packages/training/scripts/quantization/`), the converter wrapper
-  (`gguf_milady_apply.py`, `--release-state base-v1`), the DFlash distiller
+  (`gguf_eliza1_apply.py`, `--release-state base-v1`), the DFlash distiller
   (`distill_dflash_drafter.py`, `--synthetic-smoke` runs offline), the bundle
   stagers (`packages/training/scripts/manifest/stage_*.py`), the manifest builder
   (`eliza1_manifest.py`), the platform-plan generator (`eliza1_platform_plan.py`),
 
@@ -42,7 +42,7 @@
   to ASR / openWakeWord, DFlash↔TTS rollback coupling, barge-in cancellation,
   voice on/off lazy regional loading from one bundle.
 - **Release pipeline.** Quant recipes, the converter wrapper
-  (`gguf_milady_apply.py`, `--release-state base-v1`), the DFlash distiller, the
+  (`gguf_eliza1_apply.py`, `--release-state base-v1`), the DFlash distiller, the
   bundle stagers, the manifest builder, the platform-plan generator, the publish
   orchestrator (gates on `releaseState ∈ {base-v1, upload-candidate, final}` +
   the `final.*` flags + `finetuned=false` + the `sourceModels` map), the §7
 
@@ -59,7 +59,7 @@ structured-output path, which is already in the fork.)
 ```bash
 # CPU host is fine for the converter; the build needs the target backend
 # (Metal / CUDA / Vulkan / ...) — see packages/inference/AGENTS.md §8.
-export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp   # used by gguf_milady_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
+export LLAMA_CPP_DIR=$PWD/packages/inference/llama.cpp   # used by gguf_eliza1_apply.py / distill_dflash_drafter.py (both also fall back to the in-repo submodule)
 node packages/app-core/scripts/build-llama-cpp-dflash.mjs          # kernel patches + build (per supported backend)
 make -C packages/inference/verify reference-test                   # CPU host: must be clean
 ```
@@ -97,7 +97,7 @@ uv run python packages/inference/llama.cpp/convert_hf_to_gguf.py <hf-checkpoint-
   --outtype q4_k_m --outfile out/eliza-1-9b/text/eliza-1-9b-64k.gguf
 
 # Or, with the Milady type wrapper + provenance recording (CPU-safe, idempotent):
-uv run python packages/training/scripts/quantization/gguf_milady_apply.py \
+uv run python packages/training/scripts/quantization/gguf_eliza1_apply.py \
   --checkpoint <hf-checkpoint-dir-with-polarquant-codes> \
   --output     out/eliza-1-9b/text/eliza-1-9b-64k.gguf \
   --llama-cpp-dir packages/inference/llama.cpp \
@@ -114,7 +114,7 @@ the vision mmproj on 9B+ (`vision/mmproj-<tier>.gguf`), and the embedding on
 1.7B+ (`embedding/...gguf`). TTS/ASR/VAD are already GGUF/ONNX — just stage
 the right quant (`omnivoice-base-<quant>.gguf` etc.).
 
-**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_milady_apply.py` are
+**Needs a GPU?** No — `convert_hf_to_gguf.py` and `gguf_eliza1_apply.py` are
 CPU-only. They DO need the safetensors/checkpoint on disk and the fork
 checkout.
 
@@ -124,7 +124,7 @@ checkout.
 
 The five Milady quant recipes live in
 `packages/training/scripts/quantization/`. PolarQuant produces the int8
-weight codes that `gguf_milady_apply.py` packs as `Q4_POLAR` blocks;
+weight codes that `gguf_eliza1_apply.py` packs as `Q4_POLAR` blocks;
 TurboQuant + QJL are runtime KV-cache compressors — they emit the
 `quantization/*.json` sidecars the fork's runtime quantizer consumes (with
 the complete §3 `kernel_manifest` block: `kernel_target`,
@@ -329,7 +329,7 @@ and every platform-dispatch report is green for the exact shipped bytes, and
 
 | Step | Host |
 |---|---|
-| Fork converter (`convert_hf_to_gguf.py`), `gguf_milady_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only` | CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
+| Fork converter (`convert_hf_to_gguf.py`), `gguf_eliza1_apply.py`, sidecar generation, bundle staging, checksums, platform-plan regen, manifest build, `distill_dflash_drafter.py --synthetic-smoke`, `--stamp-only` | CPU host (the fork is in-tree at `packages/inference/llama.cpp`; this environment can run these once the source weights are present) |
 | Fork build with kernel patches, `metal_verify` / `vulkan_verify` / `cuda_verify` / `rocm_verify`, platform-dispatch smokes | the target backend's hardware (Metal Mac, CUDA NVIDIA, Vulkan Linux/Android, ROCm AMD; GH200-class aarch64+CUDA for the `27b-1m` tier) |
 | PolarQuant code generation, TurboQuant skip-layer calibration, DFlash distillation, text perplexity / RTF / WER / VAD / dflash / e2e / 30-turn evals | a GPU big enough for the tier (consumer GPU for 0.6B/1.7B; ≥24 GB for 9B; ≥48 GB / multi-GPU for 27B) |
 
 
@@ -91,7 +91,7 @@ helper:
 
 1. Calls `read_eliza_one_bundle(bundle_path)` and aborts on any manifest schema violation.
 2. Sets `MILADY_BENCH_PRE_RELEASE=1` when `bundle_is_pre_release(manifest)` is true. Aggregator picks this up and stamps the banner.
-3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/milady-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
+3. Spawns the dflash llama-server at `~/.cache/eliza-dflash/eliza-llama-cpp/build/bin/llama-server` against `manifest.weights_path` (passing `--model-draft` when `drafters_path` is set).
 4. Publishes `PARALLAX_OPENCODE_BASE_URL=http://127.0.0.1:18781/v1` so the OpenAI-compatible adapter finds the running server.
 
 When the dflash binary is missing the harness exits with a hard error rather
 
@@ -70,7 +70,7 @@ not a clean replay — it is a re-port:
 - `ggml/src/ggml-metal/ggml-metal*.cpp` + `ggml/src/ggml-metal/ggml-metal.metal`
   and the `ggml-metal/milady-kernels/*.metal` shaders + dispatcher entries.
 - `gguf-py/gguf/constants.py` — the GGUF Python type table (`TBQ3_0`,
-  `TBQ4_0`, `QJL1_256`, `Q4_POLAR`) the converter and the `gguf_milady_apply.py`
+  `TBQ4_0`, `QJL1_256`, `Q4_POLAR`) the converter and the `gguf_eliza1_apply.py`
   shim grep for.
 - `include/llama.h` — re-exported types + `llama_context_params` (the
   `flash_attn` bool → `flash_attn_type` enum drift bites the AOSP shim).
 
@@ -24,7 +24,7 @@ spec-decode CLI surface:
 
 Each technique has a research-grade Python apply script under
 `packages/training/scripts/quantization/`. The orchestrator at
-`packages/training/scripts/optimize_for_milady.py` is the single entry
+`packages/training/scripts/optimize_for_eliza1.py` is the single entry
 point that runs them in dependency order, drives the GGUF conversion
 with the fork's `convert_hf_to_gguf.py`, and emits a runtime manifest
 that the on-device downloader consumes.
@@ -49,15 +49,15 @@ Source: `packages/training/scripts/quantization/README.md` and
 
 ```
 packages/training/scripts/
-  optimize_for_milady.py            ← master orchestrator (this doc)
-  emit_milady_catalog.py            ← catalog.ts diff generator
+  optimize_for_eliza1.py            ← master orchestrator (this doc)
+  emit_eliza1_catalog.py            ← catalog.ts diff generator
   push_model_to_hf.py               ← HF publisher (extended with --milady-manifest)
   quantization/
     polarquant_apply.py             ← PolarQuant 4-bit weights
     qjl_apply.py                    ← QJL 1-bit K cache
     turboquant_apply.py             ← TurboQuant V cache (PyTorch reference)
     fused_turboquant_apply.py       ← TurboQuant V cache (Triton kernel; needs GPU)
-    gguf_milady_apply.py            ← GGUF emit shim for Milady GGML types
+    gguf_eliza1_apply.py            ← GGUF emit shim for Milady GGML types
 ```
 
 ### Apply order
@@ -88,7 +88,7 @@ HuggingFace: elizaos/eliza-1-<tier>
 
 ```bash
 cd packages/training
-uv run python scripts/optimize_for_milady.py \
+uv run python scripts/optimize_for_eliza1.py \
     --base-model elizaos/eliza-1-lite-0_6b \
     --output-dir checkpoints/eliza-1-lite \
     --apply polarquant qjl turboquant \
@@ -106,7 +106,7 @@ consumers know the V-cache config falls back to the framework default.
 
 ```bash
 HF_TOKEN=hf_xxx \
-uv run python scripts/optimize_for_milady.py \
+uv run python scripts/optimize_for_eliza1.py \
     --base-model elizaos/eliza-1-lite-0_6b \
     --output-dir checkpoints/eliza-1-lite \
     --apply polarquant qjl turboquant \
@@ -135,7 +135,7 @@ Production runs need:
 After publish, run:
 
 ```bash
-uv run python scripts/emit_milady_catalog.py \
+uv run python scripts/emit_eliza1_catalog.py \
     --manifest checkpoints/eliza-1-lite/gguf/milady_manifest.json \
     --catalog ../app-core/src/services/local-inference/catalog.ts \
     --output reports/training/catalog-eliza-1-lite.diff
@@ -185,7 +185,7 @@ inference call.
 ## Verified outputs (dry-run on Eliza-1 lite)
 
 ```
-$ uv run python scripts/optimize_for_milady.py \
+$ uv run python scripts/optimize_for_eliza1.py \
       --base-model elizaos/eliza-1-lite-0_6b \
       --output-dir /tmp/eliza-1-lite-test \
       --apply polarquant qjl turboquant \
@@ -252,7 +252,7 @@ TurboQuant calibration produces a real `skip_layers` profile.
 
 ## Out of scope for this pipeline
 
-- Catalog purge (W5-Catalog) — emit_milady_catalog.py only emits the
+- Catalog purge (W5-Catalog) — emit_eliza1_catalog.py only emits the
   append diff; it does not delete other catalog entries.
 - HF org provisioning (W5-HF-Org) — this script assumes the
   `elizaos` org exists and the operator's `HF_TOKEN` has write
 
@@ -1,5 +1,5 @@
 // PolarQuant pre-Hadamard-transposed (`_preht`) CPU dot wiring for the
-// v0.4.0-milady fork.
+// elizaOS/llama.cpp fork (v1.0.0-eliza).
 //
 // What this module does (applied after `git reset --hard` on the cached
 // fork checkout, every build):
 
@@ -1,4 +1,4 @@
-// CPU SIMD kernel staging for the v0.4.0-milady fork (Wave A1 wiring).
+// CPU SIMD kernel staging for the elizaOS/llama.cpp fork (v1.0.0-eliza) (Wave A1 wiring).
 //
 // What this module does:
 //
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-// CPU SIMD kernel staging for the v0.4.0-milady fork (Wave A1 wiring).`
	`1`	`+// CPU SIMD kernel staging for the elizaOS/llama.cpp fork (v1.0.0-eliza) (Wave A1 wiring).`
`2`	`2`	`//`
`3`	`3`	`// What this module does:`
`4`	`4`	`//`