Skip to content

candle-flash-attn: remove duplicate softcap: f32 in run_mha FFI decl#12

Merged
lukekim merged 2 commits into
spiceaifrom
lukim/fix-flash-attn-run-mha-ffi
Apr 17, 2026
Merged

candle-flash-attn: remove duplicate softcap: f32 in run_mha FFI decl#12
lukekim merged 2 commits into
spiceaifrom
lukim/fix-flash-attn-run-mha-ffi

Conversation

@lukekim
Copy link
Copy Markdown

@lukekim lukekim commented Apr 17, 2026

The Rust extern declaration of run_mha in candle-flash-attn/src/ffi.rs listed softcap: f32 twice — once between softmax_scale and seqlen_q, and again at the end of the parameter list (after window_size_right).

The C definition in candle-flash-attn/kernels/flash_api.cu only has a single softcap (at the end):

extern "C" void run_mha(
    ...
    float softmax_scale,
    uint32_t seqlen_q,
    ...
    int window_size_left,
    int window_size_right,
    float softcap
) { ... }

so the Rust-side signature declared 38 parameters while the call sites in src/lib.rs pass 37:

error[E0061]: this function takes 38 arguments but 37 arguments were supplied
   --> candle-flash-attn/src/lib.rs:169:13
error[E0061]: this function takes 38 arguments but 37 arguments were supplied
   --> candle-flash-attn/src/lib.rs:634:13

Drop the spurious copy between softmax_scale and seqlen_q so the Rust FFI matches the C ABI.

Validation

Reproduced while building spiced (spiceai/spiceai#10278) with --features cuda on CUDA 12.6 — the build now completes once this fix is applied.

The Rust extern declaration of `run_mha` in `candle-flash-attn/src/ffi.rs`
listed `softcap: f32` twice — once between `softmax_scale` and `seqlen_q`,
and again at the end of the parameter list (after `window_size_right`). The
C definition in `candle-flash-attn/kernels/flash_api.cu` only has a single
`softcap` (at the end), so the Rust-side signature declared 38 parameters
while the call sites in `src/lib.rs` pass 37:

    error[E0061]: this function takes 38 arguments but 37 arguments were supplied
       --> candle-flash-attn/src/lib.rs:169:13

Drop the spurious copy between `softmax_scale` and `seqlen_q` so the Rust
FFI matches the C ABI and compilation succeeds.

Reproduced while building spiced (spiceai/spiceai #10278) with `--features
cuda` on CUDA 12.6; the build now completes once this fix is applied.
@lukekim lukekim self-assigned this Apr 17, 2026
@lukekim lukekim changed the base branch from lukim/spiceai-0.10.1-lockstep to spiceai April 17, 2026 17:40
…d Stella v5 models; add clippy allow for explicit counter loops in PaddleOCR-VL and quantized LFM2 examples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants