Releases: tetherto/qvac
QVAC TTS GGML Addon v0.2.2
@qvac/tts-ggml 0.2.2
Fixed
- Android: revert the
tts-cpp2026-06-05bump (introduced in 0.2.1) that crashed the addon atdlopenduring bootstrap, failing every Android e2e run.tts-cpp2026-06-05(upstreamqvac-ext-lib-whisper.cpp@128dae42, the QVAC-19254 sched + cpu_backend refactor) added directggml_backend_is_cpu/ggml_get_type_traits_cpucalls in the statically-linkedtts-cpplibrary. On Android the sharedggml-speechport builds the CPU backend as runtime-dlopen'd per-microarch MODULE.sovariants (GGML_CPU_ALL_VARIANTS=ON+GGML_BACKEND_DL=ON; no static CPU archive), so those symbols are leftUNDinlibqvac__tts-ggml.*.soand unresolvable when Bare loads the addon →ADDON_NOT_FOUND/dlopen failed→ SIGABRT ~1s into bootstrap. iOS and desktop statically link the CPU backend and were unaffected. Pintts-cppback to2026-06-03#1(the last-known-good revision shipped in 0.2.0) so the Android addon loads cleanly again.
Reverted
- Reverts the 0.2.1 Supertonic GPU enablement (QVAC-19255, #2473) in full: the
tts-cpppin, theSupertonicModel.cpp/index.jsuseGPU/nGpuLayersgate removals, the flipped C++/integration tests, and the docs. Withtts-cppback at2026-06-03#1, Supertonic is CPU-only again (it is heavily CPU-optimised). The Supertonic GPU work will re-land once the Android CPU-backend linkage is fixed upstream (QVAC-19254 follow-up).
QVAC Vla Addon v0.3.2
- Pinned to the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.
Pull Requests
- #2438 - feat[notask]: add M-RoPE sliding context support
QVAC LLM Addon v0.24.0
This release adds sliding-context support for M-RoPE/iM-RoPE models such as Qwen3.5 and Qwen-VL style decoders. Long-running multimodal sessions can now slide under context pressure while preserving image recall, cache save/load behavior, and quantized KV-cache operation.
Features
M-RoPE/iM-RoPE sliding context
llm-llamacpp now tracks multimodal context usage as both logical decoder positions and physical KV-cache cells. This lets Qwen3.5-style prompts slide at the right time even when image chunks occupy a different number of cache cells than position slots.
Context sliding now supports bounded full-wipe and tail-preserving fallback behavior while respecting the configured discard budget. Native KV memory-operation failures surface as ContextSlideFailed, making them distinguishable from ordinary context overflow.
Shifted multimodal cache metadata now persists both logical positions and KV-cache usage, so sessions that slide after image turns can be saved and loaded without losing track of protected prefixes or current cache occupancy.
Quantized KV-cache sliding coverage
The local qvac-fabric overlay now points at the Fabric branch with M-RoPE/iM-RoPE K-shift support and quantized KV-cache shift handling. Integration coverage exercises Qwen3.5 text sliding, tool-compaction pressure, multimodal image recall after sliding save/load, quantized K-cache sliding, and Llama RoPE baseline sliding.
New APIs
ContextSlideFailed
ContextSlideFailed is a new addon error code used when Fabric/native KV memory operations reject a sliding range. Callers can now tell this apart from context overflow, where there is simply not enough room to append the requested tokens.
Pull Requests
- #2438 - feat[notask]: add M-RoPE sliding context support
QVAC Embed Addon v0.19.1
Changed
- Pinned the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.
Pull Requests
- #2438 - feat[notask]: add M-RoPE sliding context support
QVAC GGML Image Classification Lib v0.3.1
Changed
- Pinned to the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.
Pull Requests
- #2438 - feat[notask]: add M-RoPE sliding context support
QVAC Embed Addon v0.19.0
Changed
feat[bc]:RuntimeStats.context_sizenow reports the active runtime llama context size. Use the newRuntimeStats.trained_context_sizefield for the model's trained context size.- The embed runtime now defaults
ctx_sizeto the model's trained context size and caps oversizedctx_sizerequests to that value before creating the llama context. The cap also applies on streamed loads (single-GGUF and sharded) by parsing GGUF metadata from the first streamed chunk before the weights engine consumes it, mirroring theModelMetaDatapattern used byllm-llamacpp.
Fixed
- Context overflow validation now compares tokenized inputs against the active runtime context size (
llama_n_ctx), which is itself capped to the trained context. BertModel::setWeightsForFilenow tracks fulfilled GGUF shards in a per-instancestd::atomic<int>instead of a function-localstatic int, so multiple concurrentBertInterfaceinstances no longer share (and miscount) shard-fulfillment state.BertModelnow resolves sharded model basenames to absolute paths relative to the model directory before metadata inspection and disk-shards loading, so the trained-context cap andllama_model_load_from_splitswork correctly when the working directory differs from the model directory.readTrainedContextSizenow logs anERROR-level diagnostic when GGUF metadata cannot be read on either streamed or non-streaming loads (previously failed silently and reverted to llama.cpp's defaultctx_size).
QVAC Stable Diffusion Addon v0.11.2
This release restores caller control over where the diffusion text-conditioning path runs on macOS. It removes an Apple-specific override that forced the CLIP/text encoder path onto CPU.
Bug Fixes
Honor clip_on_cpu on macOS
macOS builds no longer force keep_clip_on_cpu to true during SdModel::load(). The addon now forwards config_.keepClipOnCpu on all platforms, so callers can keep the text-conditioning path on the configured backend unless they explicitly opt into CPU placement with clip_on_cpu.
QVAC SDK v0.12.2
📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.12.2
This patch release unblocks React Native and BareKit apps that bundle @qvac/sdk or @qvac/bare-sdk. Metro and Bare static analysis no longer reject the config loader, and clients can import the model registry through a dedicated subpath without pulling the full SDK graph into the bundle.
New APIs
@qvac/sdk/models and @qvac/bare-sdk/models subpaths
React Native apps that only need model constant names previously had to import from the package root, which dragged server-side modules into Metro. v0.12.2 adds a ./models export on both @qvac/sdk and @qvac/bare-sdk so you can depend on the registry alone.
import { LLAMA_3_2_1B_INST_Q4_0 } from "@qvac/sdk/models";
// or on Bare-only clients:
import { LLAMA_3_2_1B_INST_Q4_0 } from "@qvac/bare-sdk/models";Bug Fixes
Bare config loader works under Metro static analysis
BareKit and Expo consumers could fail at bundle time with errors such as Invalid call: import(filePath) when the SDK resolved qvac.config.js. The Bare config loader used dynamic import() with a runtime path, which Metro and Bare reject because the target is not a string literal.
v0.12.2 loads .js and .json config files with require(filePath) instead, which satisfies static analysis while keeping the same resolution order (QVAC_CONFIG_PATH, then project-root qvac.config.js / qvac.config.json, then defaults). Supported extensions are centralized in SUPPORTED_CONFIG_FILE_EXTS so discovery and validation stay aligned. TypeScript config files (.ts) are explicitly rejected on the Bare path with a clear error — use .js or .json in RN/Bare projects.
QVAC diagnostics Lib v0.1.2
This patch aligns @qvac/diagnostics with the monorepo’s simplified package layout and streamlines runtime and OS detection when building diagnostic reports.
Features
Monorepo layout alignment
The package now lives under the standard packages/diagnostics tree from the monorepo path simplification. Published entry points are unchanged; release and CI follow the same patterns as other QVAC add-on libraries.
Other
Simpler runtime and environment detection
Environment collection uses which-runtime for platform, architecture, and runtime version, and resolves os through package imports so Bare and Node get the right implementation without probing bare-process or multiple fallback require paths at load time.
const w = require('which-runtime')
const os = (w.isNode || w.isBare) ? require('os') : nullHardware probing (CPU model, core count, memory) still uses os when available.
Pull Requests
QVAC SDK v0.12.1
📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.12.1
This is a patch release on top of v0.12.0. It surfaces two new error classes so callers can distinguish a crashed bare worker from an in-flight call cancelled by SDK shutdown, and it fixes a Qwen 3.5/3.6 tool-call regression where capitalised booleans were silently dropping the entire tool call.
New APIs
Distinguish bare worker crashes from shutdown cancellations
Calls made through a bare worker (e.g. sdk.embed, sdk.complete) previously rejected with a generic RPC error if the worker process died mid-request or if sdk.close() was called while the request was in flight. Both cases looked identical to callers, so retry/UX logic had to guess.
v0.12.1 introduces two structured RPC errors that propagate from the worker bridge:
WorkerCrashedError— the bare worker died unexpectedly. ExposesexitCodeandexitSignalso you can tell a SIGKILL from a clean non-zero exit and decide whether to respawn.WorkerShutdownError— the SDK is shutting down (sdk.close()was called) while this request was still in flight. Safe to swallow on intentional teardown; surfaces an actionable label for callers who want to log it.
import { WorkerCrashedError, WorkerShutdownError } from "@qvac/sdk";
try {
await sdk.embed({ modelId, text: "hi" });
} catch (err) {
if (err instanceof WorkerCrashedError) {
// err.exitCode, err.exitSignal — worker died, decide whether to respawn.
} else if (err instanceof WorkerShutdownError) {
// SDK is shutting down; this call was cancelled by close().
}
}Existing catch (err) blocks that don't narrow by class continue to work unchanged — the new classes both extend the same RPC error base.
Bug Fixes
Qwen 3.5/3.6 tool calls with capitalised booleans no longer drop silently
Qwen 3.5/3.6 (the default tool-calling family) intermittently emits Python-style True / False for boolean parameters instead of the JSON-strict true / false. The qwen35 parser only accepted the exact lowercase literals, so coercion threw, the parser returned an empty toolCalls array, and the raw <tool_call>…</tool_call> markup leaked into the assistant's final text answer — there was no PARSE_ERROR, the tool call just vanished.
v0.12.1 lowercases the value before comparing in the boolean coercion path, so True, False, TRUE, and FALSE all coerce correctly. Genuinely invalid values (maybe, 0, null) still throw PARSE_ERROR — the relaxation is intentionally scoped to casing. Other tool-call dialects are unaffected.