chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run() by lauripiisang · Pull Request #2579 · tetherto/qvac

lauripiisang · 2026-06-12T19:28:58Z

What this PR does

Backmerges the @qvac/sdk + @qvac/bare-sdk 0.13.1 release onto main per gitflow "Keep main aligned". chore[skiplog].

Companion release PR

chore[notask]: release sdk + bare-sdk 0.13.1 — finish bare-* cleanup (bare-fetch ^3.0.1, decoder-audio ^0.4.0) #2578

Files / delta vs main

packages/sdk/package.json + packages/bare-sdk/package.json — version 0.13.0 → 0.13.1; bare-fetch ^2.9.1 → ^3.0.1; @qvac/decoder-audio ^0.3.7 → ^0.4.0; sdk dev bare-subprocess ^5.2.3 → ^6.1.0.
packages/sdk/server/utils/audio/decoder.ts — drop the await on decoder.run() (decoder-audio@0.4.0 returns QvacResponse synchronously).
packages/sdk/CHANGELOG.md + packages/sdk/changelog/0.13.1/ — changelog.

Cherry-picked cleanly onto main (no conflicts). This is what lands decoder-audio ^0.4.0 + the decoder.ts fix on main (the 0.12.3 backmerge #2565 that would otherwise have carried the decoder-audio bump is being closed).

Sequencing: merge after the 0.13.1 release (#2578) has published.

… dev bare-subprocess ^6.1.0 (sdk + bare-sdk 0.13.1) decoder-audio@0.4.0 drops the deprecated @qvac/response (consolidated into @qvac/infer-base) and returns QvacResponse synchronously from run(), so server/utils/audio/decoder.ts no longer awaits decoder.run(). (cherry picked from commit ca8b494)

(cherry picked from commit 3f6ac86)

github-actions · 2026-06-15T07:30:00Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

NamelsKing · 2026-06-15T09:59:48Z

/review

….1, decoder-audio ^0.4.0, decoder.ts sync run() (#2579) * fix[notask]: adopt @qvac/decoder-audio 0.4 + bump bare-fetch ^3.0.1 / dev bare-subprocess ^6.1.0 (sdk + bare-sdk 0.13.1) decoder-audio@0.4.0 drops the deprecated @qvac/response (consolidated into @qvac/infer-base) and returns QvacResponse synchronously from run(), so server/utils/audio/decoder.ts no longer awaits decoder.run(). (cherry picked from commit ca8b494) * chore[notask]: sdk + bare-sdk 0.13.1 changelog (cherry picked from commit 3f6ac86) --------- Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

) * QVAC-18929 test: add teardown/lifecycle coverage for llm-llamacpp Adds integration coverage for the addon teardown contract that was previously untested: - unload() during active inference must not crash and the model must be reusable after a reload (AddonJs.hpp documents a use-after-free risk here) - run() after unload() must surface a clean error, not segfault - cancel() then immediate unload() must not race into a use-after-free These run on desktop (on-pr-llm-llamacpp) and the mobile Device Farm pools (scheduled via test-groups.json). Assertions are non-empty / type / clean-error only, never exact generated text. Also documents why a JS multi-cycle "RSS leak tripwire" was intentionally NOT added to model-loading.test.js: a 6-cycle load/unload test already exists (multi-instance.test.js) and the native ASan/LSAN job (cpp-tests-llm.yml) is the precise leak detector. The addon exposes no backend/state observable, so an NMT-style post-unload assertion is not expressible here. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18929 chore: trim verbose comments in llm teardown tests Remove the long rationale block in model-loading.test.js (the "why we didn't add an RSS tripwire" explanation lives in the PR description, not the test file) and tighten the comments in api-behavior.test.js to the essential non-obvious intent. No behavior change — api-behavior.test.js still 8/8. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-18929 test: parse and verify tool_call structure in tools-compact Strengthens the two behavioral tests that already check for <tool_call> presence. Adds a parseToolCalls helper that extracts and JSON-parses the blocks, then verifies: - the tool_call has a non-empty name - the name matches a declared tool - required argument keys are present Structural/behavioral checks only - no model-quality assertions. Co-authored-by: Cursor <cursoragent@cursor.com> * feat[api]: DocTR depthwise convs via direct Metal CONV_2D_DW kernel (+ all-consumer fabric overlay) (#2536) * feat[api]: fold DocTR detection BatchNorm into conv weights The DBNet detection graph applied BatchNorm as a runtime scale/shift after every conv: `conv + add(bias_br) + mul(scale) + add(shift) + act`, where bias_br is a zero tensor for the (bias=False) BN convs. That is three full-tensor elementwise passes per conv on top of the conv itself. Fold the per-channel BN scale into the F16 conv weights and combine the conv bias and BN shift into a single bias at load time: out = scale*(W*x + bias) + shift = (scale*W)*x + (scale*bias + shift) so the runtime graph collapses to `conv + add(combined) + act` (one pass). This removes ~60 elementwise passes from the detection graph, which matters most on bandwidth/dispatch-constrained mobile GPUs where detection is ~55% of the DocTR pipeline. foldScaleIntoConv() folds scale into the preceding conv weight (per output channel, ne[3]) and absorbs bias_br into the shift tensor; it handles both normal BN (running stats present) and the offline-folded identity path. The sub-pixel transposed conv in the prob head is left as a runtime scale/shift since its weight is reshaped at graph build. Numerically exact: region counts unchanged (365/197/187/197) and all DocTR integration quality tests pass on Apple Metal (M4); ct_scan 1173->1157ms, clinical 858->841ms. * feat[api]: fold DocTR recognizer BatchNorm into conv weights Apply the same BN-into-weights fold to the CRNN MobileNetV3-small feature extractor: fold each BN's per-channel scale into the preceding conv's F16 weights at load time so applyBn drops the runtime multiply and keeps only the shift add. The feature-extractor graph runs once per recognition batch (dozens of times on a dense page), so removing a full-tensor multiply per conv is amplified across the page. Numerically exact: region counts unchanged and all DocTR integration quality tests pass on Apple Metal (M4). Combined with the detection fold: ct_scan 1173->1134ms, clinical 858->829ms. * feat[api]: use direct depthwise kernel for DocTR conv2d-dw on GPU The DBNet detector and CRNN recognizer feature extractors are dominated by depthwise convolutions. ggml's `ggml_conv_2d_dw` lowers each depthwise conv to im2col + a per-channel batched matmul (C tiny matmuls), which is pathologically slow on Metal — a skip-test (replacing every depthwise with a cheap same-shape op) showed recognition was ~entirely depthwise (rec 0.7s -> ~0 on ct_scan, M4). Switch both feature extractors to `ggml_conv_2d_dw_direct` (GGML_OP_CONV_2D_DW), which runs as a single fused kernel (one read, one write, no im2col buffer). This requires the companion Metal kernel for GGML_OP_CONV_2D_DW in the ggml fork (qvac-ext-ggml); CPU and Vulkan already implement it. Depthwise weights ([KW,KH,1,C], KW>1) are promoted to F32 at load so the op runs on every backend (CPU's conv_2d_dw_direct requires F32; Metal/Vulkan accept F16 too but CPU does not). F32 is perf-neutral on the GPU — the per-channel K*K weights are register-resident and the F32 activations dominate bandwidth either way (measured identical). The load-time BN-scale fold now folds into F16 or F32 weights, and the recognizer weight upload converts F16 GGUF tensors to F32 for depthwise. Result on Apple M4 Metal (warm, vs the BN-fold baseline -> with both detection and recognizer depthwise kernels): clinical 858->584ms, ct_scan 1173->754ms, lab 838->579ms, liver 841->569ms (~31-36%). All DocTR integration quality tests pass on Metal AND forced-CPU (region counts identical, keyword asserts intact). * test[notask]: overlay qvac-fabric at the Metal CONV_2D_DW merge commit (all consumers) Add a vcpkg overlay port to every qvac-fabric consumer (classification-ggml, embed-llamacpp, llm-llamacpp, ocr-ggml, translation-nmtcpp, vla-ggml) that builds fabric from the temp-8828 merge commit of the depthwise-conv kernel (qvac-fabric-llm.cpp#148, commit 7bcd140f, version 8828.1.1). This validates the new fabric across all consumers — and gives ocr-ggml the kernel its ggml_conv_2d_dw_direct path needs — before fabric is tagged 8828.1.1 and a registry port is cut. Revert these overlays + bump to the tagged port before merge. * style: clang-format DocTR depthwise + BatchNorm fold * fix[api]: clearer BN-fold dtype errors + comment on F16 scale round-trip Address review feedback on the DocTR BatchNorm fold: - The unsupported-dtype guard now names the actual ggml type and states that quantized conv weights are not supported by the fold (was a generic "unexpected conv weight dtype"). - output-channel mismatch now reports conv oc vs BN scale size. - Comment the F16 weight scale: decode->scale->re-encode is required because F16 has no arithmetic; it is not an f16->f16 copy. No behavior change; messages/comments only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * revert[api]: drop DocTR depthwise code changes, keep only fabric overlay ports Scope this PR to the all-consumer qvac-fabric overlay-port bump (which validates that the new fabric does not regress any consumer). The DocTR depthwise kernel switch (conv_2d_dw -> conv_2d_dw_direct + F32 depthwise weight promotion + BN-fold dtype handling) will land separately. Restores both files under packages/ocr-ggml/addon/src/model-interface/doctr/ to the base branch state. * Revert "revert[api]: drop DocTR depthwise code changes, keep only fabric overlay ports" This reverts commit 8c753e8. * chore[notask]: migrate qvac-fabric 8828.1.1 from overlay to registry 8828.1.1 (Metal CONV_2D_DW kernel) is now published in qvac-registry-vcpkg (tetherto/qvac-registry-vcpkg#189). Bump each consumer's qvac-fabric version>= 8828.1.0 -> 8828.1.1 and drop the temporary vcpkg-overlay-ports that pinned the unreleased temp-8828 commit. The default-registry baseline is intentionally unchanged: vcpkg resolves version>= against the registry HEAD, so a fixed baseline still picks up the new tagged version. Consumers: classification-ggml, embed-llamacpp, llm-llamacpp, ocr-ggml, translation-nmtcpp, vla-ggml. * chore[notask]: bump addon versions for qvac-fabric 8828.1.1 Minor-bump each consumer (major for ocr-ggml) and add a CHANGELOG entry for the qvac-fabric 8828.1.1 dependency (direct Metal CONV_2D_DW depthwise kernel). Minor bumps keep these out of the SDK 0.13.0 caret ranges so they are not auto-picked. - classification-ggml 0.3.1 -> 0.4.0 - embed-llamacpp 0.19.1 -> 0.20.0 - llm-llamacpp 0.24.0 -> 0.25.0 - ocr-ggml 0.1.1 -> 1.0.0 (major; cuts the prior Unreleased section) - translation-nmtcpp 5.0.1 -> 5.1.0 - vla-ggml 0.3.2 -> 0.4.0 * style: clang-format DocTR recognizer BN-fold tensor get/set Wrap the two over-long ggml_backend_tensor_get/set calls in the F16 BN-fold branch to satisfy the lint-cpp clang-format config (AlignAfterOpenBracket: AlwaysBreak). No behavior change. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * chore: drop bare-runtime and bare-pack from bare-sdk deps (#2585) bare-runtime is only reachable via node-rpc-client (the Node host path that spawns bare); bare-sdk pins #rpc to bare-client, so it is never imported on bare. Removing it also avoids pulling ~80MB of per-platform bare prebuilds at install time. bare-pack is only used by the Node-side bundle command, lazily resolved with a graceful BarePackNotInstalledError. Neither is reachable on bare. Both stay in @qvac/sdk and are added to SDK_ONLY_PACKAGES so check-deps-vs-sdk allows the intentional divergence. * chore[notask]: bump bare-fetch to ^3.0.1 in ocr-ggml and translation-nmtcpp (#2584) Aligns both addons with the latest bare-fetch major already used by rag (0.6.3) and ocr-onnx, removing the duplicate older bare-fetch major from the dependency tree. - @qvac/ocr-ggml: 0.2.0 -> 0.2.1 - @qvac/translation-nmtcpp: 6.0.0 -> 6.0.1 * doc[notask]: release docs v0.13.0 (minor) (#2573) Co-authored-by: NamelsKing <18405840+NamelsKing@users.noreply.github.com> Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com> * chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run() (#2579) * fix[notask]: adopt @qvac/decoder-audio 0.4 + bump bare-fetch ^3.0.1 / dev bare-subprocess ^6.1.0 (sdk + bare-sdk 0.13.1) decoder-audio@0.4.0 drops the deprecated @qvac/response (consolidated into @qvac/infer-base) and returns QvacResponse synchronously from run(), so server/utils/audio/decoder.ts no longer awaits decoder.run(). (cherry picked from commit ca8b494) * chore[notask]: sdk + bare-sdk 0.13.1 changelog (cherry picked from commit 3f6ac86) --------- Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com> * QVAC-19368 infra: rebalance Android Device Farm shards + faster mobile CI for LLM (#2466) * perf(ci): split Android Device Farm shards to match iOS heavy/light pattern Android groupB (11 tests, 49 min) and groupImagesPerf (3 VLM tests, 69 min) were serialising heavy tests on a single device — hitting the 2h job timeout on Pixel. Mirror the iOS strategy: isolate each heavy test into its own group (heavy1–heavy10) and bundle fast tests into lightA/lightB (12 groups total). Longest single shard drops from ~69 min to ~23 min; pool recycles devices across groups dynamically. Co-authored-by: Cursor <cursoragent@cursor.com> * perf(ci): reduce Android shards from 12 to 6 to avoid PENDING_CONCURRENCY The 12-group mirror of iOS overwhelmed the Device Farm account concurrency limit (24 total runs: 12 iOS + 12 Android). Groups queued up to 12.5 min on Android and 28 min on iOS waiting for a slot, making the monitor step slower than the original 3-group layout. Revised to 6 Android groups (18 total with iOS): - heavyA/heavyB: split the old groupB heavy tests into 2 balanced shards - imagePerfA/imagePerfB: split VLM tests 2+1 to avoid the 69-min single-group bottleneck - lightA/lightB: fast tests bundled Expected critical path: ~40-50 min (vs 69 min old, 87 min with 12 groups). Co-authored-by: Cursor <cursoragent@cursor.com> * perf(ci): parallelize Device Farm log downloads across runs With 6 Android groups × 3 devices each = 18 device-jobs, the serial log download took 52 min (each device-job ~3-7 min of API calls + artifact downloads). Process each run's logs in parallel (up to 4 concurrent), so the total is bounded by the slowest single run (~18 min) rather than the sum of all runs. Combined with the 6-group monitor improvement (57 min vs old 69 min), the estimated total Android job time drops to ~86 min — well within the 120 min timeout. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(ci): descriptive group names + test legend in Device Farm monitor Rename test groups to be self-documenting: - iOS: heavy1..heavy10 → finetuning, toolCalling, reasoning, etc. - Android: heavyA → heavyA-finetune-reason-ocr, imagePerfB → imagePerf-fruitPlate, etc. Add test-specs passthrough to the monitor step so it can print: - A "Run → tests" legend at the start (which tests are in each run) - Test names in the final results section next to each run link Now when a run fails you can immediately see which test(s) it contained without cross-referencing test-groups.json. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(ci): wire test-specs to monitor step for all addons Pass test-specs from upload-to-devicefarm through to the monitor step in all 12 addon integration workflows. Gives every addon the run-to-tests legend and test names in final results — not just LLM. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ci): extend VLM pre-warmup to Android + add heavyC group Two issues from the Android shard split: 1. Image test instability: the fruit-plate test relied on elephant running first in the same group to warm up the VLM model. With split groups each image test cold-starts alone, causing crashes on Android. Extended the iosWarmupImage pre-warmup to all mobile platforms (isMobile) so fruit-plate gets the elephant pre-warmup on Android too. 2. Heavy group imbalance: heavyA (4 tests, ~44 min) and heavyB (3 tests, ~45 min) were both too slow. Split into 3 balanced groups of 2-3 tests each: - heavyA-finetune-reasoning (2 tests) - heavyB-toolCall-gemma (2 tests) - heavyC-ocr-sliding (3 tests) Android now has 7 groups (19 total with iOS 12). Co-authored-by: Cursor <cursoragent@cursor.com> * perf(ci): skip Setup/Teardown suite artifacts + raise parallel limit Two optimizations for Device Farm log collection: 1. Skip 'Setup Test' and 'Teardown Test' suites — they only contain framework bookkeeping (home screen screenshots, install logs), not test output. Saves 2 list-artifacts API calls + downloads per device-job (21 Android device-jobs × 2 = 42 fewer API round-trips). 2. Raise MAX_PARALLEL from 4 to 8 so all runs (up to 7 Android + 12 iOS) download simultaneously instead of in waves. AWS Device Farm API handles this fine — the bottleneck was I/O wait, not CPU. Target: Android log collection from 25 min → ~12-15 min. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 perf(ci): skip VLM aurora image on normal on-PR runs The 3-image VLM perf (gemma4 + qwen3-5) made the Android on-PR leg run too long. aurora is the heaviest image, so skip it when QVAC_PERF_RUNS is at the on-PR default (<=1); the benchmark (QVAC_PERF_RUNS>1) still runs all 3. On-PR now covers elephant + fruit-plate, keeping the Android run under ~1h. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 feat(ci): show test list in monitor for non-grouped addons Default single-spec mode (addons without test-groups.json, e.g. NMT) runs with an empty grep, so the monitor's "Run → tests" legend showed nothing. Enumerate the addon's generated mobile runners (integration.auto.cjs) and emit them as a display-only `tests` field on each spec; the monitor prefers `tests` and falls back to `grep`. Grep stays empty so run behaviour is unchanged — this only enriches the legend. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 feat(ci): per-test-case results summary JSON + run-summary link Consolidate the per-device *_test-results.json into one test-results-summary.json (each test case with status + duration per device, gate-skips surfaced as 'skipped'), ship it inside the console-logs artifact, and write a compact ✅/❌/⏭️ table + an artifact link to the GitHub Step Summary. Makes it easy to see whether each case ran, passed, failed, or was skipped (e.g. the VLM aurora on-PR gate). Mobile only for now; desktop to follow. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 perf(ci): consolidate Android shards by measured runtime (10 -> 6) The Android job was hitting the 120-min cap because ~15 Device Farm runs queued behind the account concurrency limit (9-20min wait each), starving the downstream collect/extract steps. Using measured worst-case per-test runtimes, rebalance into 6 groups: toolCalling (~30m) and gemma4 (~29m) run solo (each near the per-test cap), the other functional tests pack into two ~50m shards, and the vlmPerf groups stay dedicated (so the benchmark perf-only filter still isolates them). Fewer runs = less queue contention = shorter monitor wait. All 30 functions stay covered; iOS unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 perf(ci): download only essential Device Farm artifacts Per team agreement, remove the download + upload of the full Device Farm log tree (screenshots, XML, install logs, videos) — nobody uses it and it adds significant download time to the already-tight Android job. Only Customer_Artifacts.zip (bare_console.log, test-results.json, logcat_full, perf data) and Logcat files (C++ logs) are kept. The extracted console-logs and perf-report artifacts are unchanged. Raw Device Farm artifacts are still accessible via the AWS console links in the monitor output. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): match Device Farm 'Customer Artifacts' name with space in download filter Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): revert to sequential download loop with name filter The parallelized download_run_logs with export -f was crashing on both iOS and Android (exit code 1 before downloading any artifacts). Revert to main's proven sequential loop structure and add the name filter there instead. The filter still skips TCP dump (624MB), screenshots, XML, videos — only Customer Artifacts + Logcat are downloaded. Job-level artifacts restored too (iOS needs the job-level Customer_Artifacts.zip). Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 feat(ci): parse individual test() cases from TAP output Parse brittle's TAP output (ok N / not ok N lines) from logcat_full.txt (Android) and bare_console.log (iOS) to surface every individual test() case with its status (passed/failed/skipped) and timing per device. Produces test-case-details.json in the console-logs artifact with both runner-function-level and per-test() detail. The GitHub Step Summary gets a runner table + a collapsible per-test-case table so reviewers can see at a glance whether a newly added test() actually ran. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): clean device labels in test-case-details (strip log-type suffix) Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): normalize test names + escape markdown in test-case summary Two fixes for the test-case-details step summary: 1. Normalize dynamic values in test names so the same logical test merges across devices — e.g. 'CacheTokens (53) > 0' and 'CacheTokens (55) > 0' both become 'CacheTokens (N) > 0'. This was inflating Android's count (1240) vs iOS (633) because each device produced slightly different token counts in assertion names. 2. Escape markdown-special characters (|, <, >, backtick) in test names before writing to the step summary table, so test descriptions containing these characters don't break the table layout. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): strip trailing quotes from TAP test names (ReactNativeJS echo) Android logcat echoes TAP lines twice: once from bare, once from ReactNativeJS wrapped in single quotes ('ok 1 - name'). The trailing quote made every test appear as a duplicate (e.g. 'All models available' vs 'All models available' with trailing quote), inflating Android OCR from 790 to 1461 test cases. Strip trailing quotes before deduplication. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): dedup TAP by normalized name + truncate assertion details Two fixes for correct test case counts across platforms: 1. Deduplicate TAP results by normalized name (not num+name) so perf iterations that reuse the same test name at different TAP numbers don't inflate the count. NMT was 364/546 → now 182=182. 2. Truncate test names at assertion detail markers ('. Found ', ': "', ', got ') so variable model output embedded in assertion messages doesn't create per-device duplicates. LLM elephant tests with 'Found keywords: elephant' in different phrasing now merge. Verified across all addons with real data: OCR: Android=669, iOS=669 (exact match) NMT: Android=182, iOS=182 (exact match) LLM: Android=622, iOS=608 (16 A-only = bitnet/Android-only tests, 2 I-only = Metal/iOS-only tests — all genuinely platform-specific) Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 fix(ci): address review — Android-scoped aurora skip + revert pre-warmup Two fixes per Dima's review: 1. Aurora skip is now Android-scoped using the explicit QVAC_PERF_ONLY flag (already plumbed to the device via the testspec config) instead of proxying off PERF_RUNS. iOS + desktop always run aurora. The benchmark (QVAC_PERF_ONLY=true) runs all 3 images on all platforms, even with runs=1. 2. Revert the Android pre-warmup extension back to iOS-only. The change was silently altering what Android perf numbers measure (cold first- run vs warm steady-state) and doesn't fix the crash it targeted (the large buffer allocation still happens on the first real-image pass). Restores historical comparability of Android perf data. Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 perf(ci): split cacheStateMachine to solo + rebalance to 3 func shards cacheStateMachine takes 30m on Pixel (hit the per-test Mocha timeout in funcShardB). Move it to a solo group (like toolCalling and gemma) and rebalance the remaining functional tests into 3 shards (~25-29m each on Pixel worst-case). Total Android groups: 8 (3 solo + 3 func + 2 vlmPerf). Co-authored-by: Cursor <cursoragent@cursor.com> * QVAC-19368 infra: bump LLM mobile job timeout to 150min (from 120min) Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(review): address Ian's feedback on test assertions - parseToolCalls: surface malformed JSON as t.fail() instead of silent catch - api-behavior: use instanceof Error to reject undefined/string rejections Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: olyasir <sirkinolya@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Opanin Akuffo <46673050+opaninakuffo@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: NamelsKing <18405840+NamelsKing@users.noreply.github.com> Co-authored-by: Bruno Campana <7632562+BrunoCampana@users.noreply.github.com> Co-authored-by: Lauri Piisang <lauri.piisang@gmail.com> Co-authored-by: Dmytro Medvinskyi <functionsilence@gmail.com>

lauripiisang added 2 commits June 12, 2026 23:28

chore[notask]: sdk + bare-sdk 0.13.1 changelog

a85b1cd

(cherry picked from commit 3f6ac86)

lauripiisang requested review from a team as code owners June 12, 2026 19:28

lauripiisang added the tier1 label Jun 12, 2026

This was referenced Jun 12, 2026

chore[notask]: release sdk + bare-sdk 0.12.3 — bump bare-fetch ^3.0.1 #2562

Closed

chore[skiplog|notask]: backmerge release-sdk-0.12.3 — version bump, bare-fetch ^3.0.1, dev bare-subprocess ^6.1.0, changelog #2565

Closed

lauripiisang changed the title ~~QVAC-17357 chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run()~~ chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run() Jun 12, 2026

arun-mani-j approved these changes Jun 15, 2026

View reviewed changes

NamelsKing approved these changes Jun 15, 2026

View reviewed changes

Merge branch 'main' into backmerge/release-sdk-0.13.1

5f7b8f7

NamelsKing merged commit 3002eea into main Jun 15, 2026
25 checks passed

NamelsKing deleted the backmerge/release-sdk-0.13.1 branch June 15, 2026 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run()#2579

chore[skiplog|notask]: backmerge release-sdk-0.13.1 — bare-fetch ^3.0.1, decoder-audio ^0.4.0, decoder.ts sync run()#2579
NamelsKing merged 3 commits into
mainfrom
backmerge/release-sdk-0.13.1

lauripiisang commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

NamelsKing commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lauripiisang commented Jun 12, 2026

What this PR does

Companion release PR

Files / delta vs main

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

NamelsKing commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 15, 2026 •

edited

Loading