Fix/ci hardening by SaschaOnTour · Pull Request #48 · SaschaOnTour/turboquant

SaschaOnTour · 2026-04-12T00:24:47Z

No description provided.

Copilot

Pull request overview

This PR hardens CI/release workflows and improves robustness of the Candle KV-cache implementations by replacing panic-based validation with fallible APIs, plus a small CUDA kernel maintainability refactor and additional safety guards.

Changes:

Harden GitHub Actions (pinned action SHAs, reduced default permissions, add cargo audit, pin rustqual) and add Dependabot config.
Make PqoCache::new, TqCache::new, and compute_qjl_signs return Result instead of panicking; update tests accordingly.
Add a CUDA head-dim guard in GpuPrecomputed::new and deduplicate CUDA bit-packing logic via shared helpers.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/cache_type_correctness.rs	Update tests to use fallible cache constructors (`Result`-returning `new`).
tests/cache_storage_tests.rs	Update storage roundtrip test to handle fallible `PqoCache::new`.
tests/cache_pqo_tests.rs	Update PQO tests (CPU/GPU) to handle fallible `PqoCache::new`.
src/qjl.rs	Change `compute_qjl_signs` to return `Result` on dimension mismatch; propagate at call sites.
src/cache/tq.rs	Change `TqCache::new` to return `Result`; adapt QJL sign computation to new error type.
src/cache/precomputed/mod.rs	Add guard against large `head_dim` to prevent CUDA shared memory overflow.
src/cache/pqo.rs	Change `PqoCache::new` to return `Result` instead of panicking.
src/cache/cuda/kernels/tq_quant_kernel.cu	Replace duplicated packing logic with shared helper functions.
src/cache/cuda/kernels/tq_common.h	Add atomic byte-OR helper and pack helpers for 2/3/4-bit packing.
CHANGELOG.md	Document CI hardening, API changes, head-dim guard, and CUDA helper refactor.
.github/workflows/release.yml	Harden/pin actions, set permissions, add audit step, pin rustqual.
.github/workflows/ci.yml	Harden/pin actions, set permissions, add audit step, pin rustqual.
.github/dependabot.yml	Add weekly Dependabot updates for Cargo and GitHub Actions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-12T01:30:44Z

+            /// Maximum head dimension supported by the current CUDA kernel launch
+            /// configuration. The kernels launch with `head_dim` threads per block,
+            /// so this is bounded by the device's max threads-per-block limit.
+            const CUDA_MAX_THREADS_PER_BLOCK: usize = 1024;
+            if device.is_cuda() && config.head_dim > CUDA_MAX_THREADS_PER_BLOCK {
+                return Err(super::cache_err(format!(
+                    "head_dim {} exceeds the CUDA kernel thread-block limit \
+                     ({CUDA_MAX_THREADS_PER_BLOCK}); launching a kernel with \
+                     head_dim threads per block would exceed the device maximum.",
+                    config.head_dim


The new CUDA head_dim > 1024 guard doesn’t match the actual constraints of the CUDA fast paths. The fused attention kernel has compile-time limits (TQ_MAX_QBLOCKS=8, TQ_NUM_WARPS=8) and effectively requires num_qblocks = head_dim / 32 <= 8 (i.e., head_dim <= 256); larger head_dim will make warps_per_token exceed available warps and can lead to incorrect behavior or OOB accesses. Consider tightening this validation (and/or gating the fused CUDA decode path to supported head_dim/num_qblocks and falling back to CPU dequant decode when unsupported).

Suggested change

/// Maximum head dimension supported by the current CUDA kernel launch

/// configuration. The kernels launch with `head_dim` threads per block,

/// so this is bounded by the device's max threads-per-block limit.

const CUDA_MAX_THREADS_PER_BLOCK: usize = 1024;

if device.is_cuda() && config.head_dim > CUDA_MAX_THREADS_PER_BLOCK {

return Err(super::cache_err(format!(

"head_dim {} exceeds the CUDA kernel thread-block limit \

({CUDA_MAX_THREADS_PER_BLOCK}); launching a kernel with \

head_dim threads per block would exceed the device maximum.",

config.head_dim

/// Maximum number of quantization blocks supported by the fused CUDA

/// fast path. The kernel is compiled with a fixed qblock/warp layout,

/// so larger heads are unsupported even if they fit within the device's

/// generic threads-per-block limit.

const CUDA_MAX_QBLOCKS: usize = 8;

const CUDA_MAX_HEAD_DIM: usize = QUANT_BLOCK_SIZE * CUDA_MAX_QBLOCKS;

if device.is_cuda() && config.head_dim > CUDA_MAX_HEAD_DIM {

return Err(super::cache_err(format!(

"head_dim {} exceeds the CUDA fused-kernel limit (max {} = \

{} qblocks * block size {}); larger heads are unsupported on \

the CUDA fast path.",

config.head_dim,

CUDA_MAX_HEAD_DIM,

CUDA_MAX_QBLOCKS,

QUANT_BLOCK_SIZE

SaschaOnTour added 2 commits April 12, 2026 00:09

fix Copilot review issues

8a38e67

remove redundant CUDA code

beaf110

Copilot AI review requested due to automatic review settings April 12, 2026 00:24

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 00:25 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/release.yml

Comment thread src/cache/tq.rs Outdated

Comment thread src/cache/precomputed/mod.rs Outdated

Comment thread src/qjl.rs

SaschaOnTour added 2 commits April 12, 2026 00:33

fix: Copilot comments

e61d1ec

fix: pipeline

802041e

SaschaOnTour requested review from Copilot and removed request for Copilot April 12, 2026 00:42

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 00:42 View session

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 00:43 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/release.yml

fix: pipeline

02d0fe5

SaschaOnTour requested a review from Copilot April 12, 2026 00:56

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 00:57 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread src/cache/precomputed/mod.rs Outdated

SaschaOnTour added 2 commits April 12, 2026 01:04

fix: add feature guard

08a9d87

fix: rustqual

07f1d03

Copilot AI review requested due to automatic review settings April 12, 2026 01:08

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 01:09 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread src/cache/precomputed/mod.rs Outdated

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/release.yml

fix: Copilot comment

f2dc4b6

SaschaOnTour requested a review from Copilot April 12, 2026 01:19

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 01:20 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Update CHANGELOG.md

073caaa

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 12, 2026 01:25

Copilot started reviewing on behalf of SaschaOnTour April 12, 2026 01:26 View session

SaschaOnTour merged commit 17da4df into main Apr 12, 2026
3 checks passed

Copilot AI reviewed Apr 12, 2026

View reviewed changes

SaschaOnTour linked an issue Apr 12, 2026 that may be closed by this pull request

CI hardening + code safety quick wins #42

Closed

10 tasks

-            /// Maximum head dimension supported by the current CUDA kernel launch
-            /// configuration. The kernels launch with `head_dim` threads per block,
-            /// so this is bounded by the device's max threads-per-block limit.
-            const CUDA_MAX_THREADS_PER_BLOCK: usize = 1024;
-            if device.is_cuda() && config.head_dim > CUDA_MAX_THREADS_PER_BLOCK {
-                return Err(super::cache_err(format!(
-                    "head_dim {} exceeds the CUDA kernel thread-block limit \
-                     ({CUDA_MAX_THREADS_PER_BLOCK}); launching a kernel with \
-                     head_dim threads per block would exceed the device maximum.",
-                    config.head_dim
+            /// Maximum number of quantization blocks supported by the fused CUDA
+            /// fast path. The kernel is compiled with a fixed qblock/warp layout,
+            /// so larger heads are unsupported even if they fit within the device's
+            /// generic threads-per-block limit.
+            const CUDA_MAX_QBLOCKS: usize = 8;
+            const CUDA_MAX_HEAD_DIM: usize = QUANT_BLOCK_SIZE * CUDA_MAX_QBLOCKS;
+            if device.is_cuda() && config.head_dim > CUDA_MAX_HEAD_DIM {
+                return Err(super::cache_err(format!(
+                    "head_dim {} exceeds the CUDA fused-kernel limit (max {} = \
+                     {} qblocks * block size {}); larger heads are unsupported on \
+                     the CUDA fast path.",
+                    config.head_dim,
+                    CUDA_MAX_HEAD_DIM,
+                    CUDA_MAX_QBLOCKS,
+                    QUANT_BLOCK_SIZE

Conversation

SaschaOnTour commented Apr 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants