candle-flash-attn: remove duplicate `softcap: f32` in run_mha FFI decl by lukekim · Pull Request #12 · spiceai/candle

lukekim · 2026-04-17T16:49:45Z

The Rust extern declaration of run_mha in candle-flash-attn/src/ffi.rs listed softcap: f32 twice — once between softmax_scale and seqlen_q, and again at the end of the parameter list (after window_size_right).

The C definition in candle-flash-attn/kernels/flash_api.cu only has a single softcap (at the end):

extern "C" void run_mha(
    ...
    float softmax_scale,
    uint32_t seqlen_q,
    ...
    int window_size_left,
    int window_size_right,
    float softcap
) { ... }

so the Rust-side signature declared 38 parameters while the call sites in src/lib.rs pass 37:

error[E0061]: this function takes 38 arguments but 37 arguments were supplied
   --> candle-flash-attn/src/lib.rs:169:13
error[E0061]: this function takes 38 arguments but 37 arguments were supplied
   --> candle-flash-attn/src/lib.rs:634:13

Drop the spurious copy between softmax_scale and seqlen_q so the Rust FFI matches the C ABI.

Validation

Reproduced while building spiced (spiceai/spiceai#10278) with --features cuda on CUDA 12.6 — the build now completes once this fix is applied.

The Rust extern declaration of `run_mha` in `candle-flash-attn/src/ffi.rs` listed `softcap: f32` twice — once between `softmax_scale` and `seqlen_q`, and again at the end of the parameter list (after `window_size_right`). The C definition in `candle-flash-attn/kernels/flash_api.cu` only has a single `softcap` (at the end), so the Rust-side signature declared 38 parameters while the call sites in `src/lib.rs` pass 37: error[E0061]: this function takes 38 arguments but 37 arguments were supplied --> candle-flash-attn/src/lib.rs:169:13 Drop the spurious copy between `softmax_scale` and `seqlen_q` so the Rust FFI matches the C ABI and compilation succeeds. Reproduced while building spiced (spiceai/spiceai #10278) with `--features cuda` on CUDA 12.6; the build now completes once this fix is applied.

…d Stella v5 models; add clippy allow for explicit counter loops in PaddleOCR-VL and quantized LFM2 examples.

lukekim self-assigned this Apr 17, 2026

sgrebnov approved these changes Apr 17, 2026

View reviewed changes

lukekim changed the base branch from lukim/spiceai-0.10.1-lockstep to spiceai April 17, 2026 17:40

Refactor group size calculation to use checked division in FastViT an…

c4d47ce

…d Stella v5 models; add clippy allow for explicit counter loops in PaddleOCR-VL and quantized LFM2 examples.

lukekim merged commit c87b9bc into spiceai Apr 17, 2026
12 checks passed

This was referenced Apr 17, 2026

Bump spiceai/candle-* fork revs to candle 0.10.1 / cudarc 0.19 ports spiceai/text-embeddings-inference#21

Merged

Pin spiceai candle / TEI forks to merged revs; drop local [patch] overrides spiceai/spiceai#10362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

candle-flash-attn: remove duplicate `softcap: f32` in run_mha FFI decl#12

candle-flash-attn: remove duplicate `softcap: f32` in run_mha FFI decl#12
lukekim merged 2 commits into
spiceaifrom
lukim/fix-flash-attn-run-mha-ffi

lukekim commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukekim commented Apr 17, 2026

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants