Skip to content

Commit 0e87eb5

Browse files
committed
Merge branch 'develop' of https://github.com/elizaOS/eliza into develop
# Conflicts: # packages/app-core/src/services/local-inference/voice/pipeline-impls.test.ts
2 parents 6030ee0 + ce4f13a commit 0e87eb5

58 files changed

Lines changed: 5347 additions & 505 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

ELIZA_1_RELEASE_ASSET_STATUS.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,38 @@ route yet.
161161
`elizaos/eliza-1-assets`, but no publishable per-tier
162162
`elizaos/eliza-1-*` release repos with final evidence.
163163

164+
## Publish Pipeline / Downloader State (2026-05-11, this checkout)
165+
166+
- `packages/training/scripts/publish_all_eliza1.sh` now prints the per-tier
167+
publish summary and propagates the orchestrator's structured exit code on
168+
the first failing tier (so callers can tell `EXIT_RELEASE_EVIDENCE_FAIL`
169+
= `16` from `EXIT_BUNDLE_LAYOUT_FAIL` = `10`, etc.). The
170+
abort-on-first-failure behavior from §6 is unchanged.
171+
- Dry-run was executed against a hand-built `releaseState=upload-candidate`
172+
stand-in bundle for the `0_6b` tier (`final.weights=false`): the
173+
orchestrator rejects it at stage 2 (`exit 16`, `EXIT_RELEASE_EVIDENCE_FAIL`)
174+
— exactly as the contract requires. **No tier would publish; all are
175+
blocked by non-final release evidence.** This checkout's state dir contains
176+
no staged Eliza-1 bundle; producing one requires the asset/source staging
177+
scripts (`stage_eliza1_bundle_assets.py`, `stage_eliza1_source_weights.py`,
178+
`stage_local_eliza1_bundle.py`) which need HF network access and real
179+
text/DFlash weights.
180+
- No `HF_TOKEN` / `HUGGINGFACE_TOKEN` / `HUGGINGFACE_HUB_TOKEN` is present
181+
in this environment and `huggingface-cli` is not installed. **No upload
182+
was performed.** `defaultEligible` and `publishEligible` stay `false` for
183+
every tier.
184+
- §7 device-side downloader contract hardened (see
185+
`packages/app-core/src/services/local-inference/downloader.ts`): the
186+
manifest is read first, then RAM budget and verified-backend availability
187+
are checked against the device **before any weight byte is fetched**
188+
(abort → structured `BundleIncompatibleError``failed` download event);
189+
schema version is enforced by `parseManifestOrThrow`; per-file sha256 +
190+
resume already existed; a new injectable `verifyOnDevice` hook (load →
191+
1-token text → 1-phrase voice → barge-in cancel) gates readiness and
192+
default-slot fill, recorded via `InstalledModel.bundleVerifiedAt`. Tests
193+
added in `downloader.test.ts`. Wiring the hook from the engine in
194+
`service.ts` is the remaining gap.
195+
164196
## Next Release Actions
165197

166198
1. Train/fine-tune the Eliza-1 text checkpoints for each tier.

ELIZA_1_TESTING_TODO.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,3 +235,25 @@ are complete enough for runtime-layout smoke: every tier has required local
235235
`checksums/SHA256SUMS` has been revalidated. They are not recordable release
236236
artifacts because `evidence/release.json` is intentionally
237237
`releaseState=local-standin` and `publishEligible=false`.
238+
239+
Note (this checkout / Linux x64, 2026-05-11): no staged Eliza-1 bundle exists
240+
in this checkout's state dir and no HF write token is present, so no upload
241+
was attempted. A publish dry-run against a hand-built
242+
`releaseState=upload-candidate` stand-in bundle exits `16`
243+
(`EXIT_RELEASE_EVIDENCE_FAIL`) at stage 2 — the orchestrator correctly
244+
refuses it. The publish-pipeline machinery is covered by
245+
`pytest packages/training/scripts/{test_hf_publish.py,publish/test_orchestrator.py,manifest/test_eliza1_*.py,manifest/test_stage_local_eliza1_bundle.py}`
246+
(97 passed, 1 skipped).
247+
248+
### Device-side downloader contract (§7)
249+
250+
The §7 device-side download contract is exercised by
251+
`bun test packages/app-core/src/services/local-inference/downloader.test.ts`:
252+
manifest-first read, schema-version rejection (via `parseManifestOrThrow`),
253+
RAM-budget abort before any weight byte, no-overlapping-verified-backend
254+
abort before any weight byte, per-file sha256 + resume, and the
255+
`verifyOnDevice` hook gating readiness / default-slot fill. Remaining:
256+
the engine has not yet wired the real `verifyOnDevice` smoke (load →
257+
1-token text → 1-phrase voice → barge-in cancel) into `service.ts`, and the
258+
recommendation engine does not yet call `canSetAsDefault` against the
259+
device's available backends.

packages/app-core/scripts/build-llama-cpp-dflash.mjs

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,12 @@ import {
5555
QJL_GGML_BASE_LINK_FILES,
5656
} from "./kernel-patches/cpu-simd-kernels.mjs";
5757
import { patchCpuThreadParallelism as patchCpuThreadParallelismImpl } from "./kernel-patches/cpu-thread-parallelism.mjs";
58+
import {
59+
CUDA_KERNEL_CMAKE_FLAGS,
60+
patchCudaKernels as patchCudaKernelsImpl,
61+
} from "./kernel-patches/cuda-kernels.mjs";
5862
import { patchMetalKernels as patchMetalKernelsImpl } from "./kernel-patches/metal-kernels.mjs";
63+
import { patchServerOmnivoiceRoute as patchServerOmnivoiceRouteImpl } from "./kernel-patches/server-omnivoice-route.mjs";
5964
import { patchServerStructuredOutput as patchServerStructuredOutputImpl } from "./kernel-patches/server-structured-output.mjs";
6065
import { patchVulkanKernels as patchVulkanKernelsImpl } from "./kernel-patches/vulkan-kernels.mjs";
6166
import {
@@ -609,6 +614,70 @@ target_include_directories(ggml-base PRIVATE ggml-cpu ggml-cpu/qjl ggml-cpu/qjl/
609614
);
610615
}
611616

617+
// Patch `ggml/src/ggml-cuda/CMakeLists.txt` so the staged fused-attn TU
618+
// (fused-attn-qjl-tbq.cu, copied in by patchCudaKernels) compiles its body
619+
// when `-DGGML_CUDA_FUSED_ATTN_QJL=ON` is passed. The fork's ggml-cuda
620+
// CMakeLists already carries `if (GGML_CUDA_QJL) add_compile_definitions(...)`
621+
// style blocks for the W4-B kernels; this adds the matching one for the fused
622+
// kernel right after them. Idempotent via a sentinel; hard-throws if the
623+
// anchor is missing (fork drift — AGENTS.md §3, fail closed rather than ship a
624+
// kernel-missing artifact). CUDA targets only.
625+
function patchGgmlCudaForFusedAttn(cacheDir, { dryRun = false } = {}) {
626+
const cmakeListsPath = path.join(
627+
cacheDir,
628+
"ggml",
629+
"src",
630+
"ggml-cuda",
631+
"CMakeLists.txt",
632+
);
633+
if (!fs.existsSync(cmakeListsPath)) {
634+
throw new Error(
635+
`[dflash-build] patchGgmlCudaForFusedAttn: ${cmakeListsPath} missing — ` +
636+
`the elizaOS/llama.cpp fork's ggml-cuda layout has changed.`,
637+
);
638+
}
639+
const original = fs.readFileSync(cmakeListsPath, "utf8");
640+
const sentinel = "# MILADY-CUDA-FUSED-ATTN-QJL";
641+
if (original.includes(sentinel)) return;
642+
// Anchor on the W4-B TBQ3_TCQ compile-definition block. The fork carries a
643+
// run of `if (GGML_CUDA_<KERNEL>) ... add_compile_definitions(...) ... endif()`
644+
// for QJL / POLARQUANT / TBQ3_TCQ; we append the fused-attn one after the
645+
// last of them.
646+
const anchorRe =
647+
/if\s*\(\s*GGML_CUDA_TBQ3_TCQ\s*\)[\s\S]*?endif\s*\(\s*\)/;
648+
if (!anchorRe.test(original)) {
649+
throw new Error(
650+
`[dflash-build] patchGgmlCudaForFusedAttn: could not find the ` +
651+
`GGML_CUDA_TBQ3_TCQ if/endif block in ${cmakeListsPath}; the fork's ` +
652+
`ggml-cuda CMakeLists has drifted. Fix the anchor before shipping a ` +
653+
`CUDA build (fused_attn kernel would silently compile to an empty TU).`,
654+
);
655+
}
656+
const block = `
657+
658+
${sentinel}
659+
# Fused QJL-K + TBQ-V attention (packages/inference/cuda/fused-attn-qjl-tbq.cu,
660+
# staged in by patchCudaKernels). Body is #ifdef GGML_CUDA_FUSED_ATTN_QJL; this
661+
# flips that define on when -DGGML_CUDA_FUSED_ATTN_QJL=ON is passed. Same shape
662+
# as the GGML_CUDA_QJL / POLARQUANT / TBQ3_TCQ blocks above. Optional kernel
663+
# (packages/inference/AGENTS.md §3) — off by default.
664+
if (GGML_CUDA_FUSED_ATTN_QJL)
665+
add_compile_definitions(GGML_CUDA_FUSED_ATTN_QJL)
666+
message(STATUS "ggml-cuda: GGML_CUDA_FUSED_ATTN_QJL enabled (fused QJL-K + TBQ-V attention)")
667+
endif()`;
668+
const patched = original.replace(anchorRe, (m) => `${m}${block}`);
669+
if (dryRun) {
670+
console.log(
671+
`[dflash-build] (dry-run) would patch ${cmakeListsPath} with GGML_CUDA_FUSED_ATTN_QJL block`,
672+
);
673+
return;
674+
}
675+
fs.writeFileSync(cmakeListsPath, patched, "utf8");
676+
console.log(
677+
"[dflash-build] patched ggml/src/ggml-cuda/CMakeLists.txt: add_compile_definitions(GGML_CUDA_FUSED_ATTN_QJL)",
678+
);
679+
}
680+
612681
// The fork's `ggml-vulkan.cpp` includes <vulkan/vulkan.hpp> (Vulkan-Headers)
613682
// and <spirv/unified1/spirv.hpp> (SPIRV-Headers). The Android NDK ships only
614683
// the C-level vulkan.h and no SPIRV headers, so a cross-compile against the
@@ -798,6 +867,16 @@ function cmakeFlagsForTarget(target, ctx) {
798867
} else if (backend === "cuda") {
799868
flags[flags.indexOf("-DGGML_CUDA=OFF")] = "-DGGML_CUDA=ON";
800869
flags.push("-DGGML_CUDA_FA=ON", "-DGGML_CUDA_FA_ALL_QUANTS=ON");
870+
// Fused QJL-K + TBQ-V attention CUDA kernel (packages/inference/cuda/
871+
// fused-attn-qjl-tbq.cu, staged into ggml-cuda/ by patchCudaKernels).
872+
// The kernel body is `#ifdef GGML_CUDA_FUSED_ATTN_QJL`; this flag plus
873+
// the `add_compile_definitions(GGML_CUDA_FUSED_ATTN_QJL)` line
874+
// patchGgmlCudaForFusedAttn() injects into ggml-cuda/CMakeLists.txt are
875+
// what turn the staged TU from an empty object into the live kernel.
876+
// Same shape as the GGML_CUDA_QJL / GGML_CUDA_POLARQUANT /
877+
// GGML_CUDA_TBQ3_TCQ flags the W4-B fork already carries. Optional
878+
// (AGENTS.md §3) — fused_attn sits on top of the five required kernels.
879+
flags.push(...CUDA_KERNEL_CMAKE_FLAGS);
801880
// Multi-arch fat-binary pin (see cudaArchListFlag). Without this the
802881
// build host's GPU (or sm_52 default on a GPU-less host) decides the
803882
// emitted PTX/SASS — wrong for a redistributable artifact, and the
@@ -1304,6 +1383,17 @@ function applyForkPatches(cacheDir, backend, target, { dryRun = false } = {}) {
13041383
if (backend === "vulkan") {
13051384
patchVulkanKernelsImpl(cacheDir, { dryRun, target });
13061385
}
1386+
if (backend === "cuda") {
1387+
// Stage packages/inference/cuda/fused-attn-qjl-tbq.cu into ggml-cuda/
1388+
// (the fork GLOBs *.cu) and flip the matching add_compile_definitions in
1389+
// ggml-cuda/CMakeLists.txt so -DGGML_CUDA_FUSED_ATTN_QJL=ON (pushed in the
1390+
// cuda branch of buildCmakeFlags) actually compiles the kernel body. Both
1391+
// halves together — the staged TU is inert without the define, the define
1392+
// is meaningless without the TU. AUTHORED, hardware-verify pending (no
1393+
// NVIDIA host here); a no-flag/empty-TU build stays byte-for-byte normal.
1394+
patchCudaKernelsImpl(cacheDir, { dryRun });
1395+
patchGgmlCudaForFusedAttn(cacheDir, { dryRun });
1396+
}
13071397
// llama-server structured-output + DFlash verifier-stream patch (Eliza-1
13081398
// voice swarm, W4): assert grammar_lazy / json_schema / response_format /
13091399
// continue_final_message are present in the fork's server.cpp (upstream
@@ -1334,6 +1424,17 @@ function applyForkPatches(cacheDir, backend, target, { dryRun = false } = {}) {
13341424
patchServerStructuredOutputImpl(cacheDir, { dryRun });
13351425
}
13361426
}
1427+
// Fused omnivoice TTS: mount `POST /v1/audio/speech` onto the same
1428+
// `llama-server` that serves `/completion` + `/v1/chat/completions` + the
1429+
// DFlash speculative loop (packages/inference/AGENTS.md §4 — one process,
1430+
// not two over IPC; remaining-work-ledger P0 #3 merged-route item). The
1431+
// route handler is guarded by `#ifdef MILADY_FUSE_OMNIVOICE` so non-fused
1432+
// builds are byte-for-byte unchanged; the cmake-graft separately links
1433+
// `omnivoice-core` into `llama-server` and sets that define for fused
1434+
// targets. Idempotent via the route patch's own sentinel.
1435+
if (isFusedTarget(target) && (!target || !target.startsWith("ios-"))) {
1436+
patchServerOmnivoiceRouteImpl(cacheDir, { dryRun });
1437+
}
13371438
// ggml.c (in ggml-base) calls quantize_qjl1_256 /
13381439
// dequantize_row_qjl1_256 / quantize_row_qjl1_256_ref, which live in
13391440
// ggml-cpu/qjl/. Any build where ggml-base is its own shared object

packages/app-core/scripts/kernel-patches/cuda-kernels.mjs

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,15 @@
77
// picked up unconditionally; the file body is gated by GGML_CUDA_FUSED_ATTN_QJL
88
// so a no-flag build still emits an empty object.
99
//
10-
// The matching cmake flag (-DGGML_CUDA_FUSED_ATTN_QJL=ON) and the
11-
// add_compile_definitions(GGML_CUDA_FUSED_ATTN_QJL) line that goes next to the
12-
// existing GGML_CUDA_QJL / GGML_CUDA_POLARQUANT / GGML_CUDA_TBQ3_TCQ block in
13-
// ggml-cuda/CMakeLists.txt are NOT applied here — that change lives in
14-
// build-llama-cpp-dflash.mjs (owned by the build-script agent). Until both land
15-
// the fused CUDA kernel is staged-but-inert: the symbol is absent from a
16-
// production build, which is the correct state — fused_attn is an optimization
17-
// on top of the five required kernels (AGENTS.md §3), not a required kernel.
10+
// The matching cmake flag (-DGGML_CUDA_FUSED_ATTN_QJL=ON, exported as
11+
// CUDA_KERNEL_CMAKE_FLAGS) and the add_compile_definitions(GGML_CUDA_FUSED_ATTN_QJL)
12+
// CMakeLists patch (patchGgmlCudaForFusedAttn) both live in
13+
// build-llama-cpp-dflash.mjs; its applyForkPatches() calls patchCudaKernels +
14+
// patchGgmlCudaForFusedAttn for CUDA targets and its cuda branch pushes
15+
// CUDA_KERNEL_CMAKE_FLAGS. A build without the flag (or anyone running this
16+
// staging step alone) gets a staged-but-inert TU — the symbol compiles to an
17+
// empty object, which is the correct state: fused_attn is an optimization on
18+
// top of the five required kernels (AGENTS.md §3), not a required kernel.
1819
//
1920
// Hard-throws on any error (missing source, missing fork dir, fs failure) — per
2021
// AGENTS.md §3 the build must exit non-zero rather than silently produce a

0 commit comments

Comments
 (0)