Skip to content

Commit b120dc6

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents 87e9dd7 + 5323277 commit b120dc6

16 files changed

Lines changed: 1138 additions & 157 deletions

CHANGELOG.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,83 @@
22

33
## [Unreleased]
44

5+
## [2.5.3] - 2026-05-28
6+
7+
### Features
8+
9+
- `qmd get` now accepts a `:from:count` suffix on a path or docid (e.g.
10+
`qmd get "#abc123:120:40"` reads 40 lines starting at line 120). Explicit
11+
`--from`/`-l` flags still override the suffix. The MCP `get` tool accepts the
12+
same suffix.
13+
- `qmd get` and `qmd multi-get` are now **line-numbered by default** and print
14+
the document's `#docid` and `qmd://` path in the output header. Disable line
15+
numbers with `--no-line-numbers`. The MCP `get`/`multi_get` tools default
16+
`lineNumbers` to `true` to match.
17+
- `qmd multi-get` now includes the `#docid` in every output format
18+
(`--md`, `--json`, `--csv`, `--xml`, `--files`, and the default CLI view),
19+
consistent with `qmd search`.
20+
- `qmd get` and `qmd multi-get` accept `--full-path`, which replaces the
21+
`qmd://` path + `#docid` with the document's on-disk filesystem path (handy for
22+
piping into `Read`/`Edit`/an editor). Falls back to the canonical `qmd://` +
23+
docid header when the file no longer exists on disk.
24+
- `qmd search` / `qmd query` now show a clearer hit identifier: the default CLI
25+
view (and the new `**file:**` line in `--md` output) always prints the full
26+
`qmd://collection/path` URI so you can pipe it straight back into `qmd get`.
27+
- `qmd search` / `qmd query` accept `--full-path` with the same semantics as
28+
`qmd get`: the result label becomes the file's on-disk path — `./`-prefixed
29+
relative path when the file lives in a subfolder of `$PWD`, absolute realpath
30+
otherwise — and the per-result `#docid` is dropped because the path is the
31+
identifier. The leading `./` is intentional so the output is unambiguously a
32+
filesystem path. Applies to all output formats.
33+
- `qmd get` and `qmd multi-get` now also use the `./`-prefixed convention when
34+
`--full-path` renders a path under `$PWD`, matching `search`/`query`.
35+
- New `--format <kind>` flag selects the output format (`cli` | `json` | `csv` |
36+
`md` | `xml` | `files`) for `search`, `query`, and `multi-get`. The legacy
37+
boolean aliases (`--json`/`--csv`/`--md`/`--xml`/`--files`) still work but are
38+
no longer in `--help`; prefer `--format`.
39+
40+
### Fixes
41+
42+
- Launcher: source-mode runner selection now prefers Node + tsx over Bun when
43+
both `package-lock.json` and `bun.lock` are present in the package root,
44+
mirroring the dist-mode "npm priority" rule. Fixes pnpm-global installs that
45+
copy the entire working tree (including `.git` and `bun.lock`) into the
46+
install dir and previously routed through Bun, causing ABI mismatches with
47+
the Node-built `better-sqlite3` / `sqlite-vec` native modules.
48+
- Darwin Metal: llama-using commands (`query`, `vsearch`, `embed`) no longer
49+
dump a multi-kB GGML/Metal backtrace at process exit even when output
50+
succeeded. The libggml-metal static `ggml_metal_device` destructor asserts
51+
`[rsets->data count] == 0` during `__cxa_finalize_ranges`, but the
52+
buffer-free path never calls the symmetric `ggml_metal_device_rsets_rm`
53+
to remove released rsets from the device collection (upstream
54+
ggml-org/llama.cpp#22593, one-line fix open as PR #22595). The assertion
55+
only fires when `process.exit()` skips Node's `beforeExit` hook, which is
56+
what node-llama-cpp uses to auto-dispose Metal contexts. Primary fix:
57+
`finishSuccessfulCliCommand` now sets `process.exitCode = 0` and returns
58+
instead of calling `process.exit(0)`, so `beforeExit` fires and the native
59+
binding cleans up before libc's static destructor runs. Defense-in-depth:
60+
the launcher (`bin/qmd`) and the npm test driver (`scripts/test-all.mjs`
61+
+ the `test:bun` / `test:unit` package.json scripts) also set
62+
`GGML_METAL_NO_RESIDENCY=1` on darwin before spawning node/bun, covering
63+
error paths and tests that still terminate via `process.exit()`. The env
64+
var must be set before node/bun start — libggml-metal reads it via libc
65+
`getenv` at module-load time, and Bun does not propagate `process.env`
66+
mutations to libc `setenv` — so it lives in the launcher rather than in
67+
test-preload. Residency sets give no measurable speedup for QMD's
68+
short-lived CLI workflow (benchmarked on M3 Pro). Opt back in with
69+
`QMD_METAL_KEEP_RESIDENCY=1` for long-lived qmd processes (e.g. the MCP
70+
daemon may benefit on hot reload) or to triage the upstream fix.
71+
`qmd doctor` reports the mitigation state. Minimal reproduction:
72+
`scripts/repro-metal-rsets-crash.mjs`.
73+
74+
### Docs
75+
76+
- qmd skill: emphasize reading line ranges with `get`'s built-in
77+
`:from:count` suffix / `--from`/`-l` flags instead of piping through
78+
`sed`/`head`/`tail`; cite the docid and line numbers now present in retrieval
79+
output; and author structured `intent:`/`lex:`/`vec:`/`hyde:` queries yourself
80+
rather than relying on built-in query expansion.
81+
582
## [2.5.2] - 2026-05-22
683

784
### Fixes

bin/qmd

Lines changed: 61 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,27 @@ if (process.argv[2] === "mcp") {
3030
process.env.GGML_BACKEND_SILENT = process.env.GGML_BACKEND_SILENT || "1";
3131
}
3232

33+
// libggml-metal on macOS uses "residency sets" to keep allocated model memory
34+
// resident across inference requests (180-second keep_alive timer). The
35+
// process-static device destructor that runs during libc exit() asserts the
36+
// residency set is empty (ggml-org/llama.cpp#22593); the keep_alive hasn't
37+
// expired by exit, so the assertion fails and ggml_abort dumps a multi-kB
38+
// stack trace to stderr even when the user-visible results were already
39+
// emitted correctly. No JS-side dispose can prevent it because the static
40+
// destructor runs in __cxa_finalize_ranges, after every JS-reachable cleanup.
41+
//
42+
// For QMD's short-lived CLI workflow, residency sets provide no observable
43+
// performance benefit (subsequent requests don't reuse the warm mapping —
44+
// measured: identical wall time with and without on M3 Pro), so disable them
45+
// by default on darwin. The env var must be set BEFORE the native llama.cpp
46+
// binding loads, which is why it lives here in the launcher rather than in
47+
// the JS entry point. Opt back in with QMD_METAL_KEEP_RESIDENCY=1 if you
48+
// run long-lived qmd processes (the MCP daemon may benefit on hot reload)
49+
// or are triaging an upstream Metal teardown fix.
50+
if (process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1") {
51+
process.env.GGML_METAL_NO_RESIDENCY = process.env.GGML_METAL_NO_RESIDENCY || "1";
52+
}
53+
3354
function hasBun() {
3455
try {
3556
const res = spawnSync("bun", ["--version"], { stdio: "ignore", shell: process.platform === "win32" });
@@ -43,22 +64,52 @@ function hasBun() {
4364
// dist/ is often ignored and can be stale after git reset or branch switches.
4465
// Prefer source mode only for checkouts so ./bin/qmd reflects the checked-out
4566
// source without changing packaged/runtime behavior.
67+
//
68+
// Critical: source-mode detection must NOT trigger when a package manager
69+
// installed us. `pnpm install -g .` (and `npm install -g .`) copy the entire
70+
// working tree — including .git/, bun.lock, package-lock.json, src/, and even
71+
// node_modules/ — into <prefix>/node_modules/@tobilu/qmd/, so .git and a
72+
// lockfile being present is not a reliable "this is a working tree" signal.
73+
// What IS reliable: a package-manager install always lands the package
74+
// directory inside a `node_modules/` segment; a bare working-tree checkout
75+
// (with `bun link` or a direct path invocation) does not. Gate source mode
76+
// on that. Allow QMD_SOURCE_MODE=1 / =0 as an explicit override for the
77+
// rare case where the heuristic disagrees with the user.
78+
const sourceOverride = process.env.QMD_SOURCE_MODE;
79+
const looksInstalled = pkgDir.split("/").includes("node_modules");
80+
const sourceAllowed = sourceOverride === "1"
81+
|| (sourceOverride !== "0" && !looksInstalled);
82+
4683
let useSourceMode = false;
4784
let sourceRunner = null;
4885
let sourceArgs = [];
4986

50-
if (existsSync(resolve(pkgDir, ".git")) && existsSync(tsEntry)) {
51-
if (existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"))) {
52-
if (hasBun()) {
53-
useSourceMode = true;
54-
sourceRunner = "bun";
55-
sourceArgs = [tsEntry, ...process.argv.slice(2)];
56-
}
57-
}
58-
if (!useSourceMode && existsSync(resolve(pkgDir, "node_modules/tsx/dist/cli.mjs"))) {
87+
if (sourceAllowed && existsSync(resolve(pkgDir, ".git")) && existsSync(tsEntry)) {
88+
// Lockfile-driven runner selection — mirror the dist-mode logic below so
89+
// source mode picks the same runtime the user's deps were installed for.
90+
// package-lock.json wins over bun.lock when both are present: pnpm/npm
91+
// installs ship the Node-ABI native modules (better-sqlite3, sqlite-vec),
92+
// and running Bun against them produces ABI mismatches. This also fixes
93+
// pnpm-global installs, which copy the whole working tree — including .git
94+
// and bun.lock — into the install dir and used to route through Bun even
95+
// when the user installed via npm/pnpm.
96+
const hasNpmLock = existsSync(resolve(pkgDir, "package-lock.json"));
97+
const hasBunLock = existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"));
98+
const tsxEntry = resolve(pkgDir, "node_modules/tsx/dist/cli.mjs");
99+
const tsxAvailable = existsSync(tsxEntry);
100+
101+
if (hasNpmLock && tsxAvailable) {
102+
useSourceMode = true;
103+
sourceRunner = "node";
104+
sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
105+
} else if (hasBunLock && hasBun()) {
106+
useSourceMode = true;
107+
sourceRunner = "bun";
108+
sourceArgs = [tsEntry, ...process.argv.slice(2)];
109+
} else if (tsxAvailable) {
59110
useSourceMode = true;
60111
sourceRunner = "node";
61-
sourceArgs = [resolve(pkgDir, "node_modules/tsx/dist/cli.mjs"), tsEntry, ...process.argv.slice(2)];
112+
sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
62113
}
63114
}
64115

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@tobilu/qmd",
3-
"version": "2.5.2",
3+
"version": "2.5.3",
44
"description": "Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking",
55
"type": "module",
66
"main": "dist/index.js",
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#!/usr/bin/env node
2+
/**
3+
* Minimal reproduction of llama.cpp issue ggml-org/llama.cpp#22593:
4+
*
5+
* ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
6+
*
7+
* Root cause (per the upstream issue and proposed fix PR #22595):
8+
* `ggml_metal_buffer_rset_free` releases the per-buffer residency set object
9+
* but does NOT call the symmetric `ggml_metal_device_rsets_rm`. So the
10+
* device's `rsets->data` array accumulates dangling references. When the
11+
* process exits and libc fires the process-static `ggml_metal_device`
12+
* destructor in `__cxa_finalize_ranges`, the destructor asserts the
13+
* array is empty — and it isn't.
14+
*
15+
* Observed downstream behavior:
16+
* - With EXPLICIT `dispose()` of every JS handle in order, the assertion
17+
* does NOT fire. node-llama-cpp's dispose path tears the Metal buffers
18+
* down before the static dtor runs, so the device's rsets array is
19+
* empty by exit time. (Tested locally — clean exit.)
20+
* - With NO dispose (the typical real-world case: synchronous `exit()`,
21+
* `--watch` mode, `process.exit()` after results are written, or any
22+
* code path where GC + finalizers race with libc exit), the rset
23+
* references linger until the static dtor fires, and the assertion
24+
* trips.
25+
*
26+
* What this script does:
27+
* 1. Load node-llama-cpp + a small GGUF model on the Metal backend.
28+
* This allocates at least one Metal buffer → calls rsets_add internally.
29+
* 2. Run an inference (creating an embedding context populates buffers
30+
* that the dispose path would normally clean up).
31+
* 3. Skip explicit dispose. Just let the process exit.
32+
*
33+
* Expected behavior on macOS 15+ with Apple Silicon, current llama.cpp
34+
* (bundled in node-llama-cpp 3.18.1, llama.cpp tag b8390):
35+
* - Without GGML_METAL_NO_RESIDENCY:
36+
* Script writes "ok" and main() returns, then ggml_abort fires the
37+
* assertion, prints a multi-kB backtrace, and the process exits with
38+
* SIGABRT (exit code 134).
39+
* - With GGML_METAL_NO_RESIDENCY=1:
40+
* Clean exit code 0. Residency-set code path is skipped entirely.
41+
* - With --dispose flag (manual cleanup):
42+
* Clean exit code 0 even without the env var, as long as JS dispose()
43+
* runs successfully before libc exit.
44+
*
45+
* Usage:
46+
* # Reproduce the crash (no dispose, no env var)
47+
* node scripts/repro-metal-rsets-crash.mjs
48+
*
49+
* # Verify the documented workaround
50+
* GGML_METAL_NO_RESIDENCY=1 node scripts/repro-metal-rsets-crash.mjs
51+
*
52+
* # Verify that explicit dispose also avoids the crash
53+
* node scripts/repro-metal-rsets-crash.mjs --dispose
54+
*
55+
* Refs:
56+
* https://github.com/ggml-org/llama.cpp/issues/22593 (root-cause analysis)
57+
* https://github.com/ggml-org/llama.cpp/pull/22595 (one-line fix, open)
58+
* https://github.com/tobi/qmd/issues/368 (downstream report)
59+
* https://github.com/tobi/qmd/issues/674 (downstream, current)
60+
* https://github.com/tobi/qmd/pull/600 (downstream workaround PR)
61+
*/
62+
63+
import { existsSync } from "node:fs";
64+
import { homedir } from "node:os";
65+
import { resolve } from "node:path";
66+
67+
const DEFAULT_MODEL = resolve(
68+
homedir(),
69+
".cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf",
70+
);
71+
72+
const args = process.argv.slice(2);
73+
const wantsDispose = args.includes("--dispose");
74+
const modelPath = args.find((a) => !a.startsWith("--")) ?? DEFAULT_MODEL;
75+
76+
if (!existsSync(modelPath)) {
77+
console.error(`Model not found: ${modelPath}`);
78+
console.error("Pass a path to any local GGUF as argv[1], or run `qmd embed` once to populate the default cache path.");
79+
process.exit(2);
80+
}
81+
82+
console.error(
83+
`[repro] GGML_METAL_NO_RESIDENCY=${process.env.GGML_METAL_NO_RESIDENCY ?? "(unset)"}`,
84+
);
85+
console.error(`[repro] dispose=${wantsDispose}`);
86+
console.error(`[repro] loading: ${modelPath}`);
87+
88+
const { getLlama } = await import("node-llama-cpp");
89+
90+
const llama = await getLlama();
91+
const model = await llama.loadModel({ modelPath });
92+
const context = await model.createEmbeddingContext();
93+
94+
console.error(`[repro] backend: ${llama.gpu}`);
95+
96+
// Run actual inference so the buffer-allocation path is hit.
97+
await context.getEmbeddingFor("repro text");
98+
99+
if (wantsDispose) {
100+
console.error("[repro] explicit dispose…");
101+
await context.dispose();
102+
await model.dispose();
103+
await llama.dispose();
104+
}
105+
106+
console.error("[repro] main() returning via process.exit(0)");
107+
console.log("ok");
108+
109+
// CRITICAL: use process.exit(), not `return`. node-llama-cpp registers a
110+
// `process.once('beforeExit', …)` hook that auto-disposes WeakRef'd Llama
111+
// instances when the event loop empties naturally. `process.exit()` skips
112+
// `beforeExit`, so the rsets stay populated until libc's `exit()` fires the
113+
// static dtor — which is when the upstream assertion bug trips.
114+
//
115+
// CLI tools (qmd query, qmd vsearch, qmd embed, etc.) all call process.exit()
116+
// after writing results, which is why every real downstream report crashes
117+
// even though the minimal "let main return" version does not.
118+
process.exit(0);

scripts/test-all.mjs

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,25 @@ import { fileURLToPath } from "node:url";
55

66
const root = fileURLToPath(new URL("..", import.meta.url));
77

8+
// Mirror bin/qmd's darwin Metal residency mitigation for test subprocesses.
9+
// libggml-metal asserts on a non-empty residency set during its static
10+
// destructor (ggml-org/llama.cpp#22593, fix open as #22595) and dumps a
11+
// multi-kB backtrace at process exit even when tests pass. The env var must
12+
// be set BEFORE the subprocess starts because libggml-metal reads it via
13+
// libc getenv at module-load time. Opt out with QMD_METAL_KEEP_RESIDENCY=1.
14+
const darwinMetalEnv =
15+
process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1"
16+
? { GGML_METAL_NO_RESIDENCY: "1" }
17+
: {};
18+
819
function run(label, command, args, options = {}) {
920
console.log(`==> ${label}`);
1021
const { env: extraEnv, ...spawnOptions } = options;
1122
const result = spawnSync(command, args, {
1223
cwd: root,
1324
stdio: "inherit",
1425
shell: process.platform === "win32",
15-
env: { ...process.env, ...(extraEnv ?? {}) },
26+
env: { ...process.env, ...darwinMetalEnv, ...(extraEnv ?? {}) },
1627
...spawnOptions,
1728
});
1829
if (result.status !== 0) {

0 commit comments

Comments
 (0)