gdibble
diff --git a/‎CHANGELOG.md‎
Lines changed: 77 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎bin/qmd‎
Lines changed: 61 additions & 10 deletions b/‎bin/qmd‎
Lines changed: 61 additions & 10 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎scripts/repro-metal-rsets-crash.mjs‎
Lines changed: 118 additions & 0 deletions b/‎scripts/repro-metal-rsets-crash.mjs‎
Lines changed: 118 additions & 0 deletions
diff --git a/‎scripts/test-all.mjs‎
Lines changed: 12 additions & 1 deletion b/‎scripts/test-all.mjs‎
Lines changed: 12 additions & 1 deletion
@@ -2,6 +2,83 @@
 
 ## [Unreleased]
 
+## [2.5.3] - 2026-05-28
+
+### Features
+
+- `qmd get` now accepts a `:from:count` suffix on a path or docid (e.g.
+  `qmd get "#abc123:120:40"` reads 40 lines starting at line 120). Explicit
+  `--from`/`-l` flags still override the suffix. The MCP `get` tool accepts the
+  same suffix.
+- `qmd get` and `qmd multi-get` are now **line-numbered by default** and print
+  the document's `#docid` and `qmd://` path in the output header. Disable line
+  numbers with `--no-line-numbers`. The MCP `get`/`multi_get` tools default
+  `lineNumbers` to `true` to match.
+- `qmd multi-get` now includes the `#docid` in every output format
+  (`--md`, `--json`, `--csv`, `--xml`, `--files`, and the default CLI view),
+  consistent with `qmd search`.
+- `qmd get` and `qmd multi-get` accept `--full-path`, which replaces the
+  `qmd://` path + `#docid` with the document's on-disk filesystem path (handy for
+  piping into `Read`/`Edit`/an editor). Falls back to the canonical `qmd://` +
+  docid header when the file no longer exists on disk.
+- `qmd search` / `qmd query` now show a clearer hit identifier: the default CLI
+  view (and the new `**file:**` line in `--md` output) always prints the full
+  `qmd://collection/path` URI so you can pipe it straight back into `qmd get`.
+- `qmd search` / `qmd query` accept `--full-path` with the same semantics as
+  `qmd get`: the result label becomes the file's on-disk path — `./`-prefixed
+  relative path when the file lives in a subfolder of `$PWD`, absolute realpath
+  otherwise — and the per-result `#docid` is dropped because the path is the
+  identifier. The leading `./` is intentional so the output is unambiguously a
+  filesystem path. Applies to all output formats.
+- `qmd get` and `qmd multi-get` now also use the `./`-prefixed convention when
+  `--full-path` renders a path under `$PWD`, matching `search`/`query`.
+- New `--format <kind>` flag selects the output format (`cli` | `json` | `csv` |
+  `md` | `xml` | `files`) for `search`, `query`, and `multi-get`. The legacy
+  boolean aliases (`--json`/`--csv`/`--md`/`--xml`/`--files`) still work but are
+  no longer in `--help`; prefer `--format`.
+
+### Fixes
+
+- Launcher: source-mode runner selection now prefers Node + tsx over Bun when
+  both `package-lock.json` and `bun.lock` are present in the package root,
+  mirroring the dist-mode "npm priority" rule. Fixes pnpm-global installs that
+  copy the entire working tree (including `.git` and `bun.lock`) into the
+  install dir and previously routed through Bun, causing ABI mismatches with
+  the Node-built `better-sqlite3` / `sqlite-vec` native modules.
+- Darwin Metal: llama-using commands (`query`, `vsearch`, `embed`) no longer
+  dump a multi-kB GGML/Metal backtrace at process exit even when output
+  succeeded. The libggml-metal static `ggml_metal_device` destructor asserts
+  `[rsets->data count] == 0` during `__cxa_finalize_ranges`, but the
+  buffer-free path never calls the symmetric `ggml_metal_device_rsets_rm`
+  to remove released rsets from the device collection (upstream
+  ggml-org/llama.cpp#22593, one-line fix open as PR #22595). The assertion
+  only fires when `process.exit()` skips Node's `beforeExit` hook, which is
+  what node-llama-cpp uses to auto-dispose Metal contexts. Primary fix:
+  `finishSuccessfulCliCommand` now sets `process.exitCode = 0` and returns
+  instead of calling `process.exit(0)`, so `beforeExit` fires and the native
+  binding cleans up before libc's static destructor runs. Defense-in-depth:
+  the launcher (`bin/qmd`) and the npm test driver (`scripts/test-all.mjs`
+  + the `test:bun` / `test:unit` package.json scripts) also set
+  `GGML_METAL_NO_RESIDENCY=1` on darwin before spawning node/bun, covering
+  error paths and tests that still terminate via `process.exit()`. The env
+  var must be set before node/bun start — libggml-metal reads it via libc
+  `getenv` at module-load time, and Bun does not propagate `process.env`
+  mutations to libc `setenv` — so it lives in the launcher rather than in
+  test-preload. Residency sets give no measurable speedup for QMD's
+  short-lived CLI workflow (benchmarked on M3 Pro). Opt back in with
+  `QMD_METAL_KEEP_RESIDENCY=1` for long-lived qmd processes (e.g. the MCP
+  daemon may benefit on hot reload) or to triage the upstream fix.
+  `qmd doctor` reports the mitigation state. Minimal reproduction:
+  `scripts/repro-metal-rsets-crash.mjs`.
+
+### Docs
+
+- qmd skill: emphasize reading line ranges with `get`'s built-in
+  `:from:count` suffix / `--from`/`-l` flags instead of piping through
+  `sed`/`head`/`tail`; cite the docid and line numbers now present in retrieval
+  output; and author structured `intent:`/`lex:`/`vec:`/`hyde:` queries yourself
+  rather than relying on built-in query expansion.
+
 ## [2.5.2] - 2026-05-22
 
 ### Fixes
 
@@ -30,6 +30,27 @@ if (process.argv[2] === "mcp") {
   process.env.GGML_BACKEND_SILENT = process.env.GGML_BACKEND_SILENT || "1";
 }
 
+// libggml-metal on macOS uses "residency sets" to keep allocated model memory
+// resident across inference requests (180-second keep_alive timer). The
+// process-static device destructor that runs during libc exit() asserts the
+// residency set is empty (ggml-org/llama.cpp#22593); the keep_alive hasn't
+// expired by exit, so the assertion fails and ggml_abort dumps a multi-kB
+// stack trace to stderr even when the user-visible results were already
+// emitted correctly. No JS-side dispose can prevent it because the static
+// destructor runs in __cxa_finalize_ranges, after every JS-reachable cleanup.
+//
+// For QMD's short-lived CLI workflow, residency sets provide no observable
+// performance benefit (subsequent requests don't reuse the warm mapping —
+// measured: identical wall time with and without on M3 Pro), so disable them
+// by default on darwin. The env var must be set BEFORE the native llama.cpp
+// binding loads, which is why it lives here in the launcher rather than in
+// the JS entry point. Opt back in with QMD_METAL_KEEP_RESIDENCY=1 if you
+// run long-lived qmd processes (the MCP daemon may benefit on hot reload)
+// or are triaging an upstream Metal teardown fix.
+if (process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1") {
+  process.env.GGML_METAL_NO_RESIDENCY = process.env.GGML_METAL_NO_RESIDENCY || "1";
+}
+
 function hasBun() {
   try {
     const res = spawnSync("bun", ["--version"], { stdio: "ignore", shell: process.platform === "win32" });
@@ -43,22 +64,52 @@ function hasBun() {
 // dist/ is often ignored and can be stale after git reset or branch switches.
 // Prefer source mode only for checkouts so ./bin/qmd reflects the checked-out
 // source without changing packaged/runtime behavior.
+//
+// Critical: source-mode detection must NOT trigger when a package manager
+// installed us. `pnpm install -g .` (and `npm install -g .`) copy the entire
+// working tree — including .git/, bun.lock, package-lock.json, src/, and even
+// node_modules/ — into <prefix>/node_modules/@tobilu/qmd/, so .git and a
+// lockfile being present is not a reliable "this is a working tree" signal.
+// What IS reliable: a package-manager install always lands the package
+// directory inside a `node_modules/` segment; a bare working-tree checkout
+// (with `bun link` or a direct path invocation) does not. Gate source mode
+// on that. Allow QMD_SOURCE_MODE=1 / =0 as an explicit override for the
+// rare case where the heuristic disagrees with the user.
+const sourceOverride = process.env.QMD_SOURCE_MODE;
+const looksInstalled = pkgDir.split("/").includes("node_modules");
+const sourceAllowed = sourceOverride === "1"
+  || (sourceOverride !== "0" && !looksInstalled);
+
 let useSourceMode = false;
 let sourceRunner = null;
 let sourceArgs = [];
 
-if (existsSync(resolve(pkgDir, ".git")) && existsSync(tsEntry)) {
-  if (existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"))) {
-    if (hasBun()) {
-      useSourceMode = true;
-      sourceRunner = "bun";
-      sourceArgs = [tsEntry, ...process.argv.slice(2)];
-    }
-  }
-  if (!useSourceMode && existsSync(resolve(pkgDir, "node_modules/tsx/dist/cli.mjs"))) {
+if (sourceAllowed && existsSync(resolve(pkgDir, ".git")) && existsSync(tsEntry)) {
+  // Lockfile-driven runner selection — mirror the dist-mode logic below so
+  // source mode picks the same runtime the user's deps were installed for.
+  // package-lock.json wins over bun.lock when both are present: pnpm/npm
+  // installs ship the Node-ABI native modules (better-sqlite3, sqlite-vec),
+  // and running Bun against them produces ABI mismatches. This also fixes
+  // pnpm-global installs, which copy the whole working tree — including .git
+  // and bun.lock — into the install dir and used to route through Bun even
+  // when the user installed via npm/pnpm.
+  const hasNpmLock = existsSync(resolve(pkgDir, "package-lock.json"));
+  const hasBunLock = existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"));
+  const tsxEntry = resolve(pkgDir, "node_modules/tsx/dist/cli.mjs");
+  const tsxAvailable = existsSync(tsxEntry);
+
+  if (hasNpmLock && tsxAvailable) {
+    useSourceMode = true;
+    sourceRunner = "node";
+    sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
+  } else if (hasBunLock && hasBun()) {
+    useSourceMode = true;
+    sourceRunner = "bun";
+    sourceArgs = [tsEntry, ...process.argv.slice(2)];
+  } else if (tsxAvailable) {
     useSourceMode = true;
     sourceRunner = "node";
-    sourceArgs = [resolve(pkgDir, "node_modules/tsx/dist/cli.mjs"), tsEntry, ...process.argv.slice(2)];
+    sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
   }
 }
 
 
@@ -1,6 +1,6 @@
 {
   "name": "@tobilu/qmd",
-  "version": "2.5.2",
+  "version": "2.5.3",
   "description": "Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking",
   "type": "module",
   "main": "dist/index.js",
 
@@ -0,0 +1,118 @@
+#!/usr/bin/env node
+/**
+ * Minimal reproduction of llama.cpp issue ggml-org/llama.cpp#22593:
+ *
+ *   ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
+ *
+ * Root cause (per the upstream issue and proposed fix PR #22595):
+ *   `ggml_metal_buffer_rset_free` releases the per-buffer residency set object
+ *   but does NOT call the symmetric `ggml_metal_device_rsets_rm`. So the
+ *   device's `rsets->data` array accumulates dangling references. When the
+ *   process exits and libc fires the process-static `ggml_metal_device`
+ *   destructor in `__cxa_finalize_ranges`, the destructor asserts the
+ *   array is empty — and it isn't.
+ *
+ * Observed downstream behavior:
+ *   - With EXPLICIT `dispose()` of every JS handle in order, the assertion
+ *     does NOT fire. node-llama-cpp's dispose path tears the Metal buffers
+ *     down before the static dtor runs, so the device's rsets array is
+ *     empty by exit time. (Tested locally — clean exit.)
+ *   - With NO dispose (the typical real-world case: synchronous `exit()`,
+ *     `--watch` mode, `process.exit()` after results are written, or any
+ *     code path where GC + finalizers race with libc exit), the rset
+ *     references linger until the static dtor fires, and the assertion
+ *     trips.
+ *
+ * What this script does:
+ *   1. Load node-llama-cpp + a small GGUF model on the Metal backend.
+ *      This allocates at least one Metal buffer → calls rsets_add internally.
+ *   2. Run an inference (creating an embedding context populates buffers
+ *      that the dispose path would normally clean up).
+ *   3. Skip explicit dispose. Just let the process exit.
+ *
+ * Expected behavior on macOS 15+ with Apple Silicon, current llama.cpp
+ * (bundled in node-llama-cpp 3.18.1, llama.cpp tag b8390):
+ *   - Without GGML_METAL_NO_RESIDENCY:
+ *       Script writes "ok" and main() returns, then ggml_abort fires the
+ *       assertion, prints a multi-kB backtrace, and the process exits with
+ *       SIGABRT (exit code 134).
+ *   - With GGML_METAL_NO_RESIDENCY=1:
+ *       Clean exit code 0. Residency-set code path is skipped entirely.
+ *   - With --dispose flag (manual cleanup):
+ *       Clean exit code 0 even without the env var, as long as JS dispose()
+ *       runs successfully before libc exit.
+ *
+ * Usage:
+ *   # Reproduce the crash (no dispose, no env var)
+ *   node scripts/repro-metal-rsets-crash.mjs
+ *
+ *   # Verify the documented workaround
+ *   GGML_METAL_NO_RESIDENCY=1 node scripts/repro-metal-rsets-crash.mjs
+ *
+ *   # Verify that explicit dispose also avoids the crash
+ *   node scripts/repro-metal-rsets-crash.mjs --dispose
+ *
+ * Refs:
+ *   https://github.com/ggml-org/llama.cpp/issues/22593  (root-cause analysis)
+ *   https://github.com/ggml-org/llama.cpp/pull/22595    (one-line fix, open)
+ *   https://github.com/tobi/qmd/issues/368              (downstream report)
+ *   https://github.com/tobi/qmd/issues/674              (downstream, current)
+ *   https://github.com/tobi/qmd/pull/600                (downstream workaround PR)
+ */
+
+import { existsSync } from "node:fs";
+import { homedir } from "node:os";
+import { resolve } from "node:path";
+
+const DEFAULT_MODEL = resolve(
+  homedir(),
+  ".cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf",
+);
+
+const args = process.argv.slice(2);
+const wantsDispose = args.includes("--dispose");
+const modelPath = args.find((a) => !a.startsWith("--")) ?? DEFAULT_MODEL;
+
+if (!existsSync(modelPath)) {
+  console.error(`Model not found: ${modelPath}`);
+  console.error("Pass a path to any local GGUF as argv[1], or run `qmd embed` once to populate the default cache path.");
+  process.exit(2);
+}
+
+console.error(
+  `[repro] GGML_METAL_NO_RESIDENCY=${process.env.GGML_METAL_NO_RESIDENCY ?? "(unset)"}`,
+);
+console.error(`[repro] dispose=${wantsDispose}`);
+console.error(`[repro] loading: ${modelPath}`);
+
+const { getLlama } = await import("node-llama-cpp");
+
+const llama = await getLlama();
+const model = await llama.loadModel({ modelPath });
+const context = await model.createEmbeddingContext();
+
+console.error(`[repro] backend: ${llama.gpu}`);
+
+// Run actual inference so the buffer-allocation path is hit.
+await context.getEmbeddingFor("repro text");
+
+if (wantsDispose) {
+  console.error("[repro] explicit dispose…");
+  await context.dispose();
+  await model.dispose();
+  await llama.dispose();
+}
+
+console.error("[repro] main() returning via process.exit(0)");
+console.log("ok");
+
+// CRITICAL: use process.exit(), not `return`. node-llama-cpp registers a
+// `process.once('beforeExit', …)` hook that auto-disposes WeakRef'd Llama
+// instances when the event loop empties naturally. `process.exit()` skips
+// `beforeExit`, so the rsets stay populated until libc's `exit()` fires the
+// static dtor — which is when the upstream assertion bug trips.
+//
+// CLI tools (qmd query, qmd vsearch, qmd embed, etc.) all call process.exit()
+// after writing results, which is why every real downstream report crashes
+// even though the minimal "let main return" version does not.
+process.exit(0);
@@ -5,14 +5,25 @@ import { fileURLToPath } from "node:url";
 
 const root = fileURLToPath(new URL("..", import.meta.url));
 
+// Mirror bin/qmd's darwin Metal residency mitigation for test subprocesses.
+// libggml-metal asserts on a non-empty residency set during its static
+// destructor (ggml-org/llama.cpp#22593, fix open as #22595) and dumps a
+// multi-kB backtrace at process exit even when tests pass. The env var must
+// be set BEFORE the subprocess starts because libggml-metal reads it via
+// libc getenv at module-load time. Opt out with QMD_METAL_KEEP_RESIDENCY=1.
+const darwinMetalEnv =
+  process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1"
+    ? { GGML_METAL_NO_RESIDENCY: "1" }
+    : {};
+
 function run(label, command, args, options = {}) {
   console.log(`==> ${label}`);
   const { env: extraEnv, ...spawnOptions } = options;
   const result = spawnSync(command, args, {
     cwd: root,
     stdio: "inherit",
     shell: process.platform === "win32",
-    env: { ...process.env, ...(extraEnv ?? {}) },
+    env: { ...process.env, ...darwinMetalEnv, ...(extraEnv ?? {}) },
     ...spawnOptions,
   });
   if (result.status !== 0) {
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@tobilu/qmd",`
`3`		`- "version": "2.5.2",`
	`3`	`+ "version": "2.5.3",`
`4`	`4`	`"description": "Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"main": "dist/index.js",`