Commit c11577d
authored
QVAC-17939: release(llamacpp-llm): v0.17.1 — back-port per-request grammar / json_schema (#1788)
* feat[api]: per-request GBNF grammar in llm-llamacpp generationParams (QVAC-17939)
Adds a new optional `grammar` field on the addon's per-request
`generationParams`. When set, the sampler is re-initialized with the
provided GBNF for the duration of that single `runJob` call and then
restored to the prior (load-time) state, matching the existing
save/restore pattern used for `temp` / `top_p` / `seed` / etc.
This is the per-request equivalent of the load-time `--grammar` config
key. It is the addon-level building block needed to support per-request
structured output in the SDK without requiring callers to reload the
model just to switch (or disable) a grammar.
Changes:
- `GenerationParams` (C++): new `std::optional<std::string> grammar`,
included in `hasOverrides()`.
- `AddonJs::runJob` `parseText`: read the optional `grammar` string
from per-request `generationParams`.
- `TextLlmContext::applyGenerationParams`: copy the override into
`params_.sampling.grammar` and re-init the sampler. The existing
`common_params_sampling savedSampling` snapshot already covers the
grammar field, so the restore lambda automatically reverts.
- `index.d.ts`: expose `grammar?: string` on `GenerationParams`.
- `test/integration/grammar.test.js`: covers (1) a request grammar
constraining output, (2) a follow-up request without `grammar`
reverting to unconstrained generation, and (3) a per-request
grammar overriding a load-time `grammar`. Wired into both ios
`lightB` and android `groupA` mobile groups.
`MtmdLlmContext` keeps the no-op default `applyGenerationParams` —
multimodal + grammar is out of scope for this change.
Tested locally: `bare-make build && bare-make install` succeeds,
`tsc -p tsconfig.dts.json` and `standard` lint are clean.
* feat[api]: per-request json_schema in llm-llamacpp generationParams (QVAC-17939)
Adds an ergonomic `json_schema` field on per-request `generationParams`
alongside the GBNF `grammar` field introduced in the previous commit.
Mirrors what `--json-schema` does at load time: the addon parses the
JSON Schema and converts it to GBNF natively via llama.cpp's
`json_schema_to_grammar()` (already linked through `llama::common`),
so callers never have to ship a JSON-Schema-to-GBNF converter on the
JS side.
```ts
await model.run(prompt, {
generationParams: {
json_schema: {
type: 'object',
properties: { name: { type: 'string' }, age: { type: 'integer' } },
required: ['name', 'age']
}
}
})
```
Changes:
- `vcpkg.json` / `vcpkg-configuration.json`: pull `nlohmann-json` from
the upstream microsoft/vcpkg registry (header-only). Required to
call `json_schema_to_grammar(nlohmann::ordered_json, ...)`, whose
signature lives in `llama/common/json-schema-to-grammar.h` but only
ships a forward decl by default.
- `CMakeLists.txt`: `find_package(nlohmann_json CONFIG REQUIRED)` and
link `nlohmann_json::nlohmann_json` into the addon.
- `GenerationParams` (C++): new `std::optional<std::string> json_schema`
alongside `grammar`, included in `hasOverrides()`.
- `AddonJs::runJob` `parseText`: read the optional `json_schema` string
and reject requests that set both `grammar` and `json_schema`.
- `TextLlmContext::applyGenerationParams`: when `json_schema` is set,
parse with `nlohmann::ordered_json::parse` and convert via
`json_schema_to_grammar()`, then assign to `params_.sampling.grammar`
and re-init the sampler. Parse / conversion errors surface as
`InvalidArgument` StatusError. The existing save/restore lambda
already covers `params_.sampling.grammar`, so the prior grammar is
reverted automatically after the request.
- `index.d.ts`: expose `json_schema?: string | Record<string, unknown>`
with mutual-exclusion docs against `grammar`.
- `index.js`: `normalizeGenerationParams()` accepts a plain object
(the common ergonomic shape) and JSON-stringifies it before handing
to the binding; also enforces the grammar/json_schema mutual
exclusion at the JS boundary so callers get a clearer TypeError.
- `test/integration/grammar.test.js`: three new tests covering
(1) `json_schema` (object form) constraining output to schema-valid
JSON, (2) `json_schema` (string form) accepted equivalently, and
(3) passing both `grammar` and `json_schema` in one request throws.
Tested locally: `bare-make generate && bare-make build && bare-make
install` succeeds (vcpkg fetched `nlohmann-json` 3.12.0), `tsc -p
tsconfig.dts.json` and `standard` lint clean, mobile integration test
config still validates.
* fix[api]: address review feedback on per-request grammar/json_schema
Addresses review comments on PR #1787:
- **Gianni**: wire grammar/json_schema through to `MtmdLlmContext` so
multimodal models get the same per-request hook as text-only ones.
`MtmdLlmContext::applyGenerationParams` had near-identical body to
`TextLlmContext::applyGenerationParams`, so factor the grammar/
json_schema/sampling-overrides logic into a small free function in
the new `GenerationParamsApply.{hpp,cpp}` and call it from both.
- **Jesús (doc)**: fix the misleading "Empty string disables" comment
on `GenerationParams.grammar` in `index.d.ts`. Empty string and
`undefined` both fall through to the load-time grammar — clarify
that explicitly.
- **Jesús (safety)**: handle `common_sampler_init` returning nullptr.
This happens on invalid GBNF (and therefore also on a `json_schema`
whose conversion produces a grammar the sampler rejects). Both
contexts now check the result, restore the saved sampling block,
re-init with the known-good params, and throw `InvalidArgument`.
Without this guard the addon would carry a null `smpl_` into the
next sample call and crash.
The `cli_tool` target picks up the new `GenerationParamsApply.cpp`
source and `nlohmann_json::nlohmann_json` link dependency so it stays
buildable.
Tested locally: `bare-make build && bare-make install` clean,
`tsc -p tsconfig.dts.json` and `standard` lint clean, mobile
integration test config still valid.
* release(llamacpp-llm): v0.17.1 — back-port per-request grammar / json_schema (QVAC-17939)
Cherry-picks the per-request `grammar` / `json_schema` change from
PR #1787 onto the 0.17.0 release line so SDK consumers still pinned
to `@qvac/llm-llamacpp@0.17.x` can pick up structured-output support
without having to migrate to 0.18.x first.
Bumps `package.json` to `0.17.1`, adds the matching changelog block,
and refreshes the mobile integration auto-runner (the new
`grammar.test.js` is now wired in for both ios `lightB` and android
`groupA` mobile groups). Same source changes as #1787 plus its review
fixes, no new code.
Targets the `release-llamacpp-llm-0.17.1` branch on tetherto/qvac
(branched from the `llamacpp-llm-v0.17.0` tag); merging triggers the
GPR publish for `@tetherto/llm-llamacpp@0.17.1`.
* fix[api]: apply generationParams overrides atomically (QVAC-17939)
Build the new sampling block + sampler against local copies and only
commit them onto the live `params_` / `smpl_` once the json_schema
parse/convert and `common_sampler_init()` have both succeeded. Without
this, an invalid `json_schema` paired with another override (`temp`,
`seed`, …) would write the numeric overrides into `params_` and then
throw before `applyGenerationParams()` could return its restore lambda,
leaving those mutations to leak into subsequent requests.
The helper now takes the two fields it actually mutates by reference
(`common_params_sampling&`, `int& nPredict`) so callers can pass copies
trivially. Behaviour for happy-path requests is unchanged.
Per gianni-cor review on #1787.
* fix[notask]: address CI failures on per-request grammar PR (QVAC-17939)
- test/unit/CMakeLists.txt: include `GenerationParamsApply.cpp` in the
`addon-test` source list and link `nlohmann_json::nlohmann_json` so
the new helper resolves at link time. Without these the cpp-tests
target failed with `undefined symbol:
applyGenerationOverridesToSampling(...)` after the helper was
extracted from the inline `Text/Mtmd LlmContext` paths.
- LlmContext.hpp: collapse the trailing `grammar || json_schema;` onto
one line in `GenerationParams::hasOverrides()` to satisfy
clang-format-19's wrap rules (cpp-lint).
- test/integration/grammar.test.js: pass the `model.run()` promise
directly to `t.exception(...)` instead of wrapping it in an inner
`async () => { await ... }`. Bare's runtime aborts on unhandled
rejections; the IIFE form created a small window where `model.run()`'s
rejection landed before brittle's catch handler attached, producing
`Uncaught (in promise)` and exit code 134. Direct-promise form
matches the existing pattern in `finetuning.test.js`.
* fix[notask]: use t.exception.all for native-error rejection (QVAC-17939)
`normalizeGenerationParams` throws a `TypeError` when both `grammar`
and `json_schema` are set, and brittle's plain `t.exception`
deliberately re-raises native error subclasses (TypeError,
ReferenceError, RangeError, etc.) on the basis that those "tend to be
unintentional". The result is the rejection escapes brittle's catch,
trips Bare's unhandled-rejection guard, and the test runner aborts
with exit 134 — across every integration platform plus the on-device
mobile e2e (where the WDIO crash monitor reports it as a background
crash). The earlier IIFE-→-direct-promise change didn't help because
this isn't a microtask-timing race; it's intentional brittle policy.
`t.exception.all` is the documented escape hatch (per brittle's
README) for asserting on a native-error rejection.
* fix[notask]: tolerate Vulkan teardown SIGSEGV on ai-run-linux-gpu
Mirrors commit dbad904 in integration-test-qvac-lib-infer-vla.yml.
The linux-x64 integration matrix runs on the self-hosted ai-run-linux-gpu
(Tesla T4 + Vulkan) runner. After every test in the suite passes, the
bare process crashes with SIGSEGV (exit 139) ~1s into static-destructor
teardown — inside ggml-vulkan's destructor chain interacting with the
NVIDIA Vulkan ICD. Same upstream issue already worked around for the VLA
addon.
Wrap the integration test invocation so exit 139 is tolerated IFF the
captured TAP output shows the run completed cleanly (the '# ok' end
marker AND a '# tests = N/N pass' summary). Any other non-zero exit, or
a missing TAP pass marker, still fails the job.
This is purely a CI workaround; no addon code changes.
* fix[notask]: extract per-request override helper + warn on grammar/json_schema clash (QVAC-17939)
Per jesusmb1995 review on #1788:
1. The full applyGenerationParams body in TextLlmContext.cpp was a verbatim
copy of the body in MtmdLlmContext.cpp (~50 lines each: local-copy of
sampling/n_predict, helper call, sampler init + null-check, snapshot,
commit, restore lambda). Hoist into a free function
`applyGenerationParamsToContext(common_params&, CommonSamplerPtr&,
llama_model*, const GenerationParams&)` in GenerationParamsApply.cpp
that returns the restore lambda. Both contexts collapse to a single
forwarding line.
2. The clang/JSON helper already had a "schema wins" precedence branch
for the (theoretically unreachable) case where both `grammar` and
`json_schema` are set, but no log. The JS and AddonJs paths both
reject that combination, so reaching it means a direct C++ caller
(unit tests or `cli_tool`) bypassed the boundary. Add a `LOG_WRN`
stating which field is being applied so the issue is visible when
debugging from C++.
No behaviour change for normal callers; the lambda's capture mode
switches from `[this, ...]` (method context) to
`[¶ms, &smpl, model, ...]` (free function), with identical
lifetime guarantees — the owning context outlives any single request.
Locally clang-format-19 clean, tsc + standard clean, all addon TUs
compile.
* Revert "fix[notask]: tolerate Vulkan teardown SIGSEGV on ai-run-linux-gpu"
This reverts commit eeff742.1 parent 60f70ef commit c11577d
17 files changed
Lines changed: 542 additions & 67 deletions
File tree
- packages/qvac-lib-infer-llamacpp-llm
- addon/src
- addon
- model-interface
- test
- integration
- mobile
- unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
3 | 44 | | |
4 | 45 | | |
5 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
36 | 42 | | |
37 | 43 | | |
38 | 44 | | |
| |||
65 | 71 | | |
66 | 72 | | |
67 | 73 | | |
| 74 | + | |
68 | 75 | | |
69 | 76 | | |
70 | 77 | | |
| |||
98 | 105 | | |
99 | 106 | | |
100 | 107 | | |
| 108 | + | |
101 | 109 | | |
102 | 110 | | |
103 | 111 | | |
| |||
113 | 121 | | |
114 | 122 | | |
115 | 123 | | |
| 124 | + | |
116 | 125 | | |
117 | 126 | | |
118 | 127 | | |
| |||
142 | 151 | | |
143 | 152 | | |
144 | 153 | | |
| 154 | + | |
145 | 155 | | |
146 | 156 | | |
147 | 157 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
342 | 363 | | |
343 | 364 | | |
344 | 365 | | |
| |||
Lines changed: 121 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
Lines changed: 56 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
Lines changed: 12 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
23 | 34 | | |
24 | 35 | | |
25 | 36 | | |
26 | | - | |
| 37 | + | |
27 | 38 | | |
28 | 39 | | |
29 | 40 | | |
| |||
Lines changed: 2 additions & 32 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
480 | 481 | | |
481 | 482 | | |
482 | 483 | | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
496 | | - | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
513 | | - | |
514 | | - | |
| 484 | + | |
515 | 485 | | |
516 | 486 | | |
517 | 487 | | |
| |||
0 commit comments