Skip to content

Fix/ci hardening#48

Merged
SaschaOnTour merged 9 commits intomainfrom
fix/ci-hardening
Apr 12, 2026
Merged

Fix/ci hardening#48
SaschaOnTour merged 9 commits intomainfrom
fix/ci-hardening

Conversation

@SaschaOnTour
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings April 12, 2026 00:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens CI/release workflows and improves robustness of the Candle KV-cache implementations by replacing panic-based validation with fallible APIs, plus a small CUDA kernel maintainability refactor and additional safety guards.

Changes:

  • Harden GitHub Actions (pinned action SHAs, reduced default permissions, add cargo audit, pin rustqual) and add Dependabot config.
  • Make PqoCache::new, TqCache::new, and compute_qjl_signs return Result instead of panicking; update tests accordingly.
  • Add a CUDA head-dim guard in GpuPrecomputed::new and deduplicate CUDA bit-packing logic via shared helpers.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/cache_type_correctness.rs Update tests to use fallible cache constructors (Result-returning new).
tests/cache_storage_tests.rs Update storage roundtrip test to handle fallible PqoCache::new.
tests/cache_pqo_tests.rs Update PQO tests (CPU/GPU) to handle fallible PqoCache::new.
src/qjl.rs Change compute_qjl_signs to return Result on dimension mismatch; propagate at call sites.
src/cache/tq.rs Change TqCache::new to return Result; adapt QJL sign computation to new error type.
src/cache/precomputed/mod.rs Add guard against large head_dim to prevent CUDA shared memory overflow.
src/cache/pqo.rs Change PqoCache::new to return Result instead of panicking.
src/cache/cuda/kernels/tq_quant_kernel.cu Replace duplicated packing logic with shared helper functions.
src/cache/cuda/kernels/tq_common.h Add atomic byte-OR helper and pack helpers for 2/3/4-bit packing.
CHANGELOG.md Document CI hardening, API changes, head-dim guard, and CUDA helper refactor.
.github/workflows/release.yml Harden/pin actions, set permissions, add audit step, pin rustqual.
.github/workflows/ci.yml Harden/pin actions, set permissions, add audit step, pin rustqual.
.github/dependabot.yml Add weekly Dependabot updates for Cargo and GitHub Actions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/release.yml
Comment thread src/cache/tq.rs Outdated
Comment thread src/cache/precomputed/mod.rs Outdated
Comment thread src/qjl.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/release.yml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cache/precomputed/mod.rs Outdated
Copilot AI review requested due to automatic review settings April 12, 2026 01:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cache/precomputed/mod.rs Outdated
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/release.yml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 12, 2026 01:25
@SaschaOnTour SaschaOnTour merged commit 17da4df into main Apr 12, 2026
3 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to +63
/// Maximum head dimension supported by the current CUDA kernel launch
/// configuration. The kernels launch with `head_dim` threads per block,
/// so this is bounded by the device's max threads-per-block limit.
const CUDA_MAX_THREADS_PER_BLOCK: usize = 1024;
if device.is_cuda() && config.head_dim > CUDA_MAX_THREADS_PER_BLOCK {
return Err(super::cache_err(format!(
"head_dim {} exceeds the CUDA kernel thread-block limit \
({CUDA_MAX_THREADS_PER_BLOCK}); launching a kernel with \
head_dim threads per block would exceed the device maximum.",
config.head_dim
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CUDA head_dim > 1024 guard doesn’t match the actual constraints of the CUDA fast paths. The fused attention kernel has compile-time limits (TQ_MAX_QBLOCKS=8, TQ_NUM_WARPS=8) and effectively requires num_qblocks = head_dim / 32 <= 8 (i.e., head_dim <= 256); larger head_dim will make warps_per_token exceed available warps and can lead to incorrect behavior or OOB accesses. Consider tightening this validation (and/or gating the fused CUDA decode path to supported head_dim/num_qblocks and falling back to CPU dequant decode when unsupported).

Suggested change
/// Maximum head dimension supported by the current CUDA kernel launch
/// configuration. The kernels launch with `head_dim` threads per block,
/// so this is bounded by the device's max threads-per-block limit.
const CUDA_MAX_THREADS_PER_BLOCK: usize = 1024;
if device.is_cuda() && config.head_dim > CUDA_MAX_THREADS_PER_BLOCK {
return Err(super::cache_err(format!(
"head_dim {} exceeds the CUDA kernel thread-block limit \
({CUDA_MAX_THREADS_PER_BLOCK}); launching a kernel with \
head_dim threads per block would exceed the device maximum.",
config.head_dim
/// Maximum number of quantization blocks supported by the fused CUDA
/// fast path. The kernel is compiled with a fixed qblock/warp layout,
/// so larger heads are unsupported even if they fit within the device's
/// generic threads-per-block limit.
const CUDA_MAX_QBLOCKS: usize = 8;
const CUDA_MAX_HEAD_DIM: usize = QUANT_BLOCK_SIZE * CUDA_MAX_QBLOCKS;
if device.is_cuda() && config.head_dim > CUDA_MAX_HEAD_DIM {
return Err(super::cache_err(format!(
"head_dim {} exceeds the CUDA fused-kernel limit (max {} = \
{} qblocks * block size {}); larger heads are unsupported on \
the CUDA fast path.",
config.head_dim,
CUDA_MAX_HEAD_DIM,
CUDA_MAX_QBLOCKS,
QUANT_BLOCK_SIZE

Copilot uses AI. Check for mistakes.
@SaschaOnTour SaschaOnTour linked an issue Apr 12, 2026 that may be closed by this pull request
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI hardening + code safety quick wins

2 participants