Skip to content

Releases: tetherto/qvac

QVAC TTS GGML Addon v0.2.2

10 Jun 14:23
e8b61d9

Choose a tag to compare

@qvac/tts-ggml 0.2.2

Fixed

  • Android: revert the tts-cpp 2026-06-05 bump (introduced in 0.2.1) that crashed the addon at dlopen during bootstrap, failing every Android e2e run. tts-cpp 2026-06-05 (upstream qvac-ext-lib-whisper.cpp@128dae42, the QVAC-19254 sched + cpu_backend refactor) added direct ggml_backend_is_cpu / ggml_get_type_traits_cpu calls in the statically-linked tts-cpp library. On Android the shared ggml-speech port builds the CPU backend as runtime-dlopen'd per-microarch MODULE .so variants (GGML_CPU_ALL_VARIANTS=ON + GGML_BACKEND_DL=ON; no static CPU archive), so those symbols are left UND in libqvac__tts-ggml.*.so and unresolvable when Bare loads the addon → ADDON_NOT_FOUND / dlopen failed → SIGABRT ~1s into bootstrap. iOS and desktop statically link the CPU backend and were unaffected. Pin tts-cpp back to 2026-06-03#1 (the last-known-good revision shipped in 0.2.0) so the Android addon loads cleanly again.

Reverted

  • Reverts the 0.2.1 Supertonic GPU enablement (QVAC-19255, #2473) in full: the tts-cpp pin, the SupertonicModel.cpp / index.js useGPU / nGpuLayers gate removals, the flipped C++/integration tests, and the docs. With tts-cpp back at 2026-06-03#1, Supertonic is CPU-only again (it is heavily CPU-optimised). The Supertonic GPU work will re-land once the Android CPU-backend linkage is fixed upstream (QVAC-19254 follow-up).

QVAC Vla Addon v0.3.2

09 Jun 13:49
293a871

Choose a tag to compare

  • Pinned to the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.

Pull Requests

  • #2438 - feat[notask]: add M-RoPE sliding context support

QVAC LLM Addon v0.24.0

09 Jun 13:50
293a871

Choose a tag to compare

This release adds sliding-context support for M-RoPE/iM-RoPE models such as Qwen3.5 and Qwen-VL style decoders. Long-running multimodal sessions can now slide under context pressure while preserving image recall, cache save/load behavior, and quantized KV-cache operation.

Features

M-RoPE/iM-RoPE sliding context

llm-llamacpp now tracks multimodal context usage as both logical decoder positions and physical KV-cache cells. This lets Qwen3.5-style prompts slide at the right time even when image chunks occupy a different number of cache cells than position slots.

Context sliding now supports bounded full-wipe and tail-preserving fallback behavior while respecting the configured discard budget. Native KV memory-operation failures surface as ContextSlideFailed, making them distinguishable from ordinary context overflow.

Shifted multimodal cache metadata now persists both logical positions and KV-cache usage, so sessions that slide after image turns can be saved and loaded without losing track of protected prefixes or current cache occupancy.

Quantized KV-cache sliding coverage

The local qvac-fabric overlay now points at the Fabric branch with M-RoPE/iM-RoPE K-shift support and quantized KV-cache shift handling. Integration coverage exercises Qwen3.5 text sliding, tool-compaction pressure, multimodal image recall after sliding save/load, quantized K-cache sliding, and Llama RoPE baseline sliding.

New APIs

ContextSlideFailed

ContextSlideFailed is a new addon error code used when Fabric/native KV memory operations reject a sliding range. Callers can now tell this apart from context overflow, where there is simply not enough room to append the requested tokens.

Pull Requests

  • #2438 - feat[notask]: add M-RoPE sliding context support

QVAC Embed Addon v0.19.1

09 Jun 13:51
293a871

Choose a tag to compare

Changed

  • Pinned the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.

Pull Requests

  • #2438 - feat[notask]: add M-RoPE sliding context support

QVAC GGML Image Classification Lib v0.3.1

09 Jun 13:50
293a871

Choose a tag to compare

Changed

  • Pinned to the Fabric revision used by the M-RoPE/iM-RoPE sliding-context work.

Pull Requests

  • #2438 - feat[notask]: add M-RoPE sliding context support

QVAC Embed Addon v0.19.0

05 Jun 14:38
63e993b

Choose a tag to compare

Changed

  • feat[bc]: RuntimeStats.context_size now reports the active runtime llama context size. Use the new RuntimeStats.trained_context_size field for the model's trained context size.
  • The embed runtime now defaults ctx_size to the model's trained context size and caps oversized ctx_size requests to that value before creating the llama context. The cap also applies on streamed loads (single-GGUF and sharded) by parsing GGUF metadata from the first streamed chunk before the weights engine consumes it, mirroring the ModelMetaData pattern used by llm-llamacpp.

Fixed

  • Context overflow validation now compares tokenized inputs against the active runtime context size (llama_n_ctx), which is itself capped to the trained context.
  • BertModel::setWeightsForFile now tracks fulfilled GGUF shards in a per-instance std::atomic<int> instead of a function-local static int, so multiple concurrent BertInterface instances no longer share (and miscount) shard-fulfillment state.
  • BertModel now resolves sharded model basenames to absolute paths relative to the model directory before metadata inspection and disk-shards loading, so the trained-context cap and llama_model_load_from_splits work correctly when the working directory differs from the model directory.
  • readTrainedContextSize now logs an ERROR-level diagnostic when GGUF metadata cannot be read on either streamed or non-streaming loads (previously failed silently and reverted to llama.cpp's default ctx_size).

QVAC Stable Diffusion Addon v0.11.2

05 Jun 19:08
751bade

Choose a tag to compare

This release restores caller control over where the diffusion text-conditioning path runs on macOS. It removes an Apple-specific override that forced the CLIP/text encoder path onto CPU.

Bug Fixes

Honor clip_on_cpu on macOS

macOS builds no longer force keep_clip_on_cpu to true during SdModel::load(). The addon now forwards config_.keepClipOnCpu on all platforms, so callers can keep the text-conditioning path on the configured backend unless they explicitly opt into CPU placement with clip_on_cpu.

QVAC SDK v0.12.2

04 Jun 12:49
2a45919

Choose a tag to compare

📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.12.2

This patch release unblocks React Native and BareKit apps that bundle @qvac/sdk or @qvac/bare-sdk. Metro and Bare static analysis no longer reject the config loader, and clients can import the model registry through a dedicated subpath without pulling the full SDK graph into the bundle.

New APIs

@qvac/sdk/models and @qvac/bare-sdk/models subpaths

React Native apps that only need model constant names previously had to import from the package root, which dragged server-side modules into Metro. v0.12.2 adds a ./models export on both @qvac/sdk and @qvac/bare-sdk so you can depend on the registry alone.

import { LLAMA_3_2_1B_INST_Q4_0 } from "@qvac/sdk/models";
// or on Bare-only clients:
import { LLAMA_3_2_1B_INST_Q4_0 } from "@qvac/bare-sdk/models";

Bug Fixes

Bare config loader works under Metro static analysis

BareKit and Expo consumers could fail at bundle time with errors such as Invalid call: import(filePath) when the SDK resolved qvac.config.js. The Bare config loader used dynamic import() with a runtime path, which Metro and Bare reject because the target is not a string literal.

v0.12.2 loads .js and .json config files with require(filePath) instead, which satisfies static analysis while keeping the same resolution order (QVAC_CONFIG_PATH, then project-root qvac.config.js / qvac.config.json, then defaults). Supported extensions are centralized in SUPPORTED_CONFIG_FILE_EXTS so discovery and validation stay aligned. TypeScript config files (.ts) are explicitly rejected on the Bare path with a clear error — use .js or .json in RN/Bare projects.

QVAC diagnostics Lib v0.1.2

05 Jun 07:32
7084527

Choose a tag to compare

This patch aligns @qvac/diagnostics with the monorepo’s simplified package layout and streamlines runtime and OS detection when building diagnostic reports.

Features

Monorepo layout alignment

The package now lives under the standard packages/diagnostics tree from the monorepo path simplification. Published entry points are unchanged; release and CI follow the same patterns as other QVAC add-on libraries.

Other

Simpler runtime and environment detection

Environment collection uses which-runtime for platform, architecture, and runtime version, and resolves os through package imports so Bare and Node get the right implementation without probing bare-process or multiple fallback require paths at load time.

const w = require('which-runtime')
const os = (w.isNode || w.isBare) ? require('os') : null

Hardware probing (CPU model, core count, memory) still uses os when available.

Pull Requests

  • #1860 - QVAC-16441 feat: simplify package folders, files and paths in the monorepo
  • #2157 - simplify

QVAC SDK v0.12.1

03 Jun 11:10
c131d38

Choose a tag to compare

📦 NPM: https://www.npmjs.com/package/@qvac/sdk/v/0.12.1

This is a patch release on top of v0.12.0. It surfaces two new error classes so callers can distinguish a crashed bare worker from an in-flight call cancelled by SDK shutdown, and it fixes a Qwen 3.5/3.6 tool-call regression where capitalised booleans were silently dropping the entire tool call.

New APIs

Distinguish bare worker crashes from shutdown cancellations

Calls made through a bare worker (e.g. sdk.embed, sdk.complete) previously rejected with a generic RPC error if the worker process died mid-request or if sdk.close() was called while the request was in flight. Both cases looked identical to callers, so retry/UX logic had to guess.

v0.12.1 introduces two structured RPC errors that propagate from the worker bridge:

  • WorkerCrashedError — the bare worker died unexpectedly. Exposes exitCode and exitSignal so you can tell a SIGKILL from a clean non-zero exit and decide whether to respawn.
  • WorkerShutdownError — the SDK is shutting down (sdk.close() was called) while this request was still in flight. Safe to swallow on intentional teardown; surfaces an actionable label for callers who want to log it.
import { WorkerCrashedError, WorkerShutdownError } from "@qvac/sdk";

try {
  await sdk.embed({ modelId, text: "hi" });
} catch (err) {
  if (err instanceof WorkerCrashedError) {
    // err.exitCode, err.exitSignal — worker died, decide whether to respawn.
  } else if (err instanceof WorkerShutdownError) {
    // SDK is shutting down; this call was cancelled by close().
  }
}

Existing catch (err) blocks that don't narrow by class continue to work unchanged — the new classes both extend the same RPC error base.

Bug Fixes

Qwen 3.5/3.6 tool calls with capitalised booleans no longer drop silently

Qwen 3.5/3.6 (the default tool-calling family) intermittently emits Python-style True / False for boolean parameters instead of the JSON-strict true / false. The qwen35 parser only accepted the exact lowercase literals, so coercion threw, the parser returned an empty toolCalls array, and the raw <tool_call>…</tool_call> markup leaked into the assistant's final text answer — there was no PARSE_ERROR, the tool call just vanished.

v0.12.1 lowercases the value before comparing in the boolean coercion path, so True, False, TRUE, and FALSE all coerce correctly. Genuinely invalid values (maybe, 0, null) still throw PARSE_ERROR — the relaxation is intentionally scoped to casing. Other tool-call dialects are unaffected.