Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
315 commits
Select commit Hold shift + click to select a range
0af3e42
fix: place `ug` dep behind `not wasm32` flag (#2760)
DougAnderson444 Feb 1, 2025
7c2449f
Metal: Improved reduce and softmax (#1819)
ivarflakstad Feb 8, 2025
2423d63
add dynamic position encoding to Siglip (#2770)
ameroyer Feb 14, 2025
3ddd20a
update to cudarc to v0.13.5 to support cuda 12.8 (#2771)
MichaelMcCulloch Feb 15, 2025
fd7f724
Bump the crate version to 0.8.3 (#2772)
LaurentMazare Feb 15, 2025
e6cc76f
Implement DeepSeek V2 (#2744)
EricLBuehler Feb 19, 2025
ac9cdbd
Refactor From<Tuple> implementations by using macros, add tests (#2762)
philipfabianek Feb 19, 2025
9e8bf70
Avoid some clippy lints on 1.85. (#2778)
LaurentMazare Feb 22, 2025
26c1692
Make sorted_nodes pub function (#2780)
viirya Feb 22, 2025
add3a71
phi-4-mini (#2790)
janimo Mar 1, 2025
37db86f
Allow ModernBert to be used to generate embeddings. (#2791)
rectalogic Mar 3, 2025
e4ffb85
Add ModernBert sentency classifier (#2796)
Mnwa Mar 8, 2025
e286cf7
Parse the json config for siglip models. (#2800)
LaurentMazare Mar 9, 2025
111edbc
Gemma 3 initial setup (text only). (#2802)
LaurentMazare Mar 14, 2025
c930ab7
upgrade half library to fix rand (#2806)
seddonm1 Mar 14, 2025
468d1d5
Bump the crate version to 0.8.4. (#2808)
LaurentMazare Mar 15, 2025
cbf5fc8
Add Gemma 3 1b IT toe Gemma examples (#2809)
andreban Mar 16, 2025
3afb049
Allow for growing the default KV cache when needed. (#2810)
LaurentMazare Mar 16, 2025
0b24f7f
Fix for whisper example. rand::distribution is now rand::distr (#2811)
beurdouche Mar 16, 2025
67b85f7
Pickle decoder fix and Long1 opcode addition. (#2824)
computer-whisperer Mar 23, 2025
f3d4729
fix: `candle-flash-attn` linux and `msvc` build (#2829)
xkeyC Mar 25, 2025
10853b8
fixed rand imports for whisper-microphone example (#2834)
greenrazer Mar 26, 2025
0d40970
fixed rand import for mnist-training (#2833)
greenrazer Mar 26, 2025
cb02b38
Fix reinforcement learning example (#2837)
Brooooooklyn Mar 26, 2025
59c2619
Fix CIFAR10 dataset types and dimension ordering (#2845)
brylee10 Mar 30, 2025
ba47329
Added DeepseekR1 Qwen7B variant to quantized-qwen2-instruct example (…
greenrazer Mar 30, 2025
6429609
Added Deepseekr1 Llama8b variant to quantized example (#2842)
greenrazer Mar 30, 2025
9541467
Add `flip` to `tensor` (#2855)
brylee10 Apr 1, 2025
b4daa03
add as_cuda_slice_mut to CudaStorage and CudaDType (#2859)
zackangelo Apr 1, 2025
d6db305
Added new language pairs to marian-mt example. (#2860)
greenrazer Apr 2, 2025
d9904a3
Update to cudarc 0.14 (breaking change). (#2858)
LaurentMazare Apr 3, 2025
648596c
Added readmes to examples (#2835)
greenrazer Apr 3, 2025
9d31361
Fix for clippy 1.86. (#2864)
LaurentMazare Apr 3, 2025
cf9d7bf
Add the CSM model. (#2862)
LaurentMazare Apr 4, 2025
bc33df7
Add the missing voices for CSM. (#2867)
LaurentMazare Apr 5, 2025
338f6a1
Clippy 1.86 fixes for cuda. (#2868)
LaurentMazare Apr 5, 2025
e3370c6
Add the SNAC audio tokenizer. (#2869)
LaurentMazare Apr 6, 2025
2f3bf42
Support more snac variants. (#2871)
LaurentMazare Apr 7, 2025
d339b01
Fix hardcoded f32 dtype for attention_mask. Use the model dtype for c…
msminhas93 Apr 8, 2025
eb478ec
Implementing DistilBertForMaskedLM. (#2866)
greenrazer Apr 11, 2025
acc5bd3
Cuda cleanup. (#2880)
LaurentMazare Apr 11, 2025
19fb6da
Bump the crate version. (#2881)
LaurentMazare Apr 11, 2025
d7b7ce1
Upgrade ug. (#2882)
LaurentMazare Apr 12, 2025
34505fd
Avoid using batched-matmul in nn::Linear. (#2883)
LaurentMazare Apr 12, 2025
15ed0b1
Optimize the batched matmul for the cpu backend. (#2884)
LaurentMazare Apr 12, 2025
d9198de
Im2col cuda optimization. (#2885)
LaurentMazare Apr 13, 2025
b44d38d
Add the Orpheus TTS. (#2886)
LaurentMazare Apr 13, 2025
f3a73f8
Support for cudnn conv1d. (#2888)
LaurentMazare Apr 13, 2025
2f9606b
Exclude candle-book to avoid some CI failures. (#2889)
LaurentMazare Apr 13, 2025
fb660b8
Add a cudnn feature to candle-nn/candle-transformers. (#2890)
LaurentMazare Apr 13, 2025
a52b76a
Expose the cudnn algo in the conv ops. (#2892)
LaurentMazare Apr 14, 2025
2653002
Gumbel-Softmax sampling. (#2894)
LaurentMazare Apr 14, 2025
1d1d6d4
Bump the crate version. (#2895)
LaurentMazare Apr 14, 2025
b01ebba
Use cudarc 0.15.2. (#2896)
LaurentMazare Apr 14, 2025
e4e7b0b
Use cudarc 0.16. (#2900)
LaurentMazare Apr 15, 2025
76e565c
Updated candle-book: Introduction, Installation, MNIST guide, and add…
greenrazer Apr 15, 2025
7f0f83a
Rotating kv cache positions (#2901)
LaurentMazare Apr 15, 2025
9954981
Allow from_vec/from_slice to use a ShapeWithOneHole as shape. (#2905)
LaurentMazare Apr 17, 2025
ce5f8dd
Check the bounds in the cuda indexing kernels. (#2908)
LaurentMazare Apr 18, 2025
9dbaf95
Add an enum for scalar values. (#2909)
LaurentMazare Apr 18, 2025
21055b5
Add PRelu operation (#2904)
A2va Apr 19, 2025
b2904a8
implemented quantized-gemma3 (#2902)
greenrazer Apr 19, 2025
a4c56a9
Add the const-set op. (#2910)
LaurentMazare Apr 19, 2025
99bd69f
fixed quantized-gemma example (#2914)
greenrazer Apr 23, 2025
82def7a
Cudarc update. (#2915)
LaurentMazare Apr 23, 2025
6ff0a69
Fixed Gemma3 model and example (#2917)
greenrazer Apr 25, 2025
3aeb957
Fixed Quantized Gemma3 Model and example (#2918)
greenrazer Apr 25, 2025
3827685
Add the scatter op. (#2921)
LaurentMazare Apr 25, 2025
a2e9254
Add the scatter in place ops. (#2923)
LaurentMazare Apr 26, 2025
fbaf0b0
Bump the crate version to 0.9.0. (#2924)
LaurentMazare Apr 26, 2025
6e0646c
Remove redundant mlx gemm dtype check (#2925)
ivarflakstad Apr 27, 2025
e3db300
Support for "unbatched" rope. (#2926)
LaurentMazare Apr 27, 2025
e98754f
Optimize Tensor::new when called on nested Vec<..>. (#2927)
LaurentMazare Apr 28, 2025
d4bac37
Fix the gumbel softmax by casting to f32. (#2928)
LaurentMazare Apr 28, 2025
de23d34
Switch Tensor::full to return a contiguous tensor. (#2929)
LaurentMazare Apr 28, 2025
5029ac5
Added tracing page to the candle book. (#2922)
greenrazer Apr 29, 2025
38fc866
Add support for Helium-v1. (#2932)
LaurentMazare Apr 30, 2025
8a19bb7
Bump the candle version to 0.9.1. (#2935)
LaurentMazare May 1, 2025
cd96fa8
Add a scattered kv cache. (#2936)
LaurentMazare May 1, 2025
66be13b
fixed quantized_phi3 implementation
ljt019 May 1, 2025
1fdfb58
Updating `Add qwen3` (PR 2903) to use HF weights (#2930)
greenrazer May 2, 2025
e27b470
Indexing with max-value results in zero/no-op. (#2940)
LaurentMazare May 3, 2025
637473c
Bump cudarc to 0.16.3. (#2942)
LaurentMazare May 4, 2025
3d05f5c
Qwen3 quantized implementation (#2939)
ljt019 May 8, 2025
36508a2
Add Resize to onnx ops (#2946)
greenrazer May 10, 2025
485ddf2
Fixed Quantized Qwen3 Model (#2951)
nosnakeob May 13, 2025
6bd6172
Make tensor contiguous before the repeat_kv calls to avoid strided co…
b0r3k May 14, 2025
450a49e
Olmo 2 model (#2954)
janimo May 14, 2025
9ce4fe6
Fix docs quantized qwen3 (#2955)
maximizemaxwell May 15, 2025
92106c8
Fixes for clippy 1.87. (#2956)
LaurentMazare May 15, 2025
9a62c91
Proper support for phi-4 (#2960)
LaurentMazare May 21, 2025
61ddb95
Use a tanh activation in the xlm-roberta classification head. (#2968)
LaurentMazare May 26, 2025
cac51fe
(hotfix) fix the doc test for indexer (#2970)
klion26 May 28, 2025
1a183c9
Add fine-tuned text classifier to xlm roberta example (#2969)
jpe90 May 28, 2025
5aed817
feat: enhance linear algebra operations (#2972)
ssfdust May 29, 2025
cd7b877
candle-onnx: Implement Trilu and ScatterND ops (#2952)
greenrazer May 30, 2025
0224a74
Add Qwen3 MoE (#2934)
greenrazer May 31, 2025
17313a4
Fix cuda memory error for Qwen3 non-quantized (#2987)
akshayballal95 Jun 7, 2025
407c667
candle-onnx: Implement RNN operator (#2964)
BrunoSienkiewicz Jun 24, 2025
23968db
Fix typos (#2958)
omahs Jun 24, 2025
2e5dbc7
candle-onnx: Implement Hard Swish operator (#2980)
Michall00 Jun 24, 2025
a6e8aae
fixed errors with hardswish merge (#3006)
greenrazer Jun 26, 2025
0cd4fc4
Fixed Failing CI (#3007)
greenrazer Jun 26, 2025
ab14581
Qwen3: fix quality loss due to rope freq precision (#3005)
zackangelo Jun 26, 2025
d0a3b33
fixed ring mac error (#3008)
greenrazer Jun 27, 2025
317a3ae
Support new arch of GLM4 models (#2991)
guoqingbao Jul 7, 2025
be411aa
candle-onnx: Implement One Hot operator (#2979)
Michall00 Jul 7, 2025
9c8a02f
fix (candle-datasets): re-export FileReader and simplify from_hub ite…
xavierforge Jul 16, 2025
16b7b77
candle-datasets: add fashion-mnist (#3021)
slckl Jul 16, 2025
1f07074
candle-onnx: Implement Selu operator (#2978)
Michall00 Jul 16, 2025
6c95317
fix: DAC model prefix (#3020)
piedshag Jul 17, 2025
1ef1341
*Major T/s improvement* Use the Metal qmatmul MM kernels (#2615)
EricLBuehler Jul 18, 2025
42bd33e
Fix discord badge (#3033)
strickvl Jul 23, 2025
da5498c
Added GradStore::insert_id(id, grad)
EthanAlmloff Jul 29, 2025
26a3222
Support building on CPUs with AVX but not AVX2 (#3040)
jncraton Jul 31, 2025
21032cb
[FEAT] Voxtral Support (#3036)
jorge-menjivar Aug 4, 2025
96415a4
ignored url that was interpreted as a secret by trufflehog (#3046)
greenrazer Aug 4, 2025
af5a69e
fp8 support (#2989)
zackangelo Aug 4, 2025
86bcf1e
Load safetensors i8 (#3042)
chadvoegele Aug 5, 2025
1829812
Fix sort kernel launch bug when nrows exceed gridDim.y limit (65535) …
guoqingbao Aug 11, 2025
be4f920
clippy fixes (#3053)
greenrazer Aug 12, 2025
d7c5c8a
Add timestamp rules and constraints to decoder in Whisper example (#3…
rsb-tbg Aug 18, 2025
f1286e6
Fix wasm build by enabling getrandom wasm_js backend (#3055)
lucky-bai Aug 18, 2025
16e1d73
pick seed <= u32::MAX when using metal (#3045)
rgbkrk Aug 20, 2025
730fa9c
Fix broken slice_scatter example in basics.rs
davenpi Aug 21, 2025
5d6407f
Run cargo fmt on basics.rs
davenpi Aug 22, 2025
98c64c0
Metal device.set_seed full u64 support (#3067)
ivarflakstad Aug 25, 2025
03e9ce0
disable affine fp8 bench on metal as it is not supported yet (#3065)
ivarflakstad Aug 25, 2025
02cf3eb
Bench using chosen device only (#3066)
ivarflakstad Aug 26, 2025
fd350c4
Fixes metal randn determinism. Ensure we use the 2 atomic_uints buffe…
ivarflakstad Aug 27, 2025
bf82629
build: Make build.rs candle-kernels compatible with Nix and sandboxed…
joeldsouzax Aug 28, 2025
06387ae
[Metal] update to objc2_metal (#3064)
ivarflakstad Aug 29, 2025
d4a9179
Fused CPU attention kernels (~4x performance increase) (#2973)
EricLBuehler Aug 29, 2025
41b1e95
Fix typos
szepeviktor Aug 30, 2025
93845ed
Merge pull request #3072 from szepeviktor/typos
ivarflakstad Aug 30, 2025
390b87a
Fix iOS app store validation issues (#3071)
greenrazer Sep 3, 2025
402782c
Merge pull request #3038 from NoodlesOfWrath/gradstore_insert_id
ivarflakstad Sep 6, 2025
f62e725
clean candle-core typos.
zhanluxianshen Sep 7, 2025
0bbf9c7
Ensure metal tensors are send/sync via thread isolated command buffer…
ivarflakstad Sep 8, 2025
3b35cfc
Update kv_cache.rs (#3035)
jhqxxx Sep 8, 2025
0cf516d
[Metal] Refactor (#3070)
ivarflakstad Sep 8, 2025
87fadf6
Merge pull request #3077 from zhanluxianshen/typo-candle-core
ivarflakstad Sep 8, 2025
0950959
Fix metal exports (#3081)
ivarflakstad Sep 8, 2025
a7fbc63
Merge branch 'main' into metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
65055f6
Merge pull request #3079 from huggingface/metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
b1dbce0
Merge pull request #3062 from davenpi/fix/core-basics-example
ivarflakstad Sep 9, 2025
8045af9
Add CUDA 13 support (#3078)
jfernandez Sep 9, 2025
97594d2
Fix indentation
ivarflakstad Sep 9, 2025
038e28b
Fix indentation (ok but for real)
ivarflakstad Sep 9, 2025
372c9cf
Merge pull request #2937 from ljt019/fix-phi3-kv-cache-reset
ivarflakstad Sep 9, 2025
41a674c
add impl for mish activation function (#3051)
oa-root Sep 12, 2025
dd12467
Upgrade ug dep for CUDA 13 support
grahamking Sep 18, 2025
1a699fb
Merge pull request #3089 from grahamking/main
ivarflakstad Sep 20, 2025
ec3d92e
Various minor improvements, some suggested by clippy
ivarflakstad Sep 22, 2025
f583891
Merge pull request #3023 from xavierforge/bug/metadata-method-not-found
ivarflakstad Sep 22, 2025
944947a
Add command buffer thread map. Remove unecessary failure points
ivarflakstad Sep 30, 2025
b06d2fd
Merge pull request #3092 from huggingface/metal-clippy-fixes
ivarflakstad Sep 30, 2025
bc13c4b
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Sep 30, 2025
d205fb4
Fix multiple clippy warnings (#3101)
ivarflakstad Sep 30, 2025
d16eaf5
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Oct 1, 2025
7bfc5af
Wait until completed on command buffer status: scheduled as well
ivarflakstad Oct 1, 2025
df50343
Add metal conv for more dtypes
ivarflakstad Oct 2, 2025
c16785b
Allow based to run with bf16 on metal
ivarflakstad Oct 2, 2025
26c7868
Add backtracing to metal kernel errors for clarity
ivarflakstad Oct 2, 2025
7c5a8f2
Merge pull request #3103 from huggingface/metal-fix-conv
ivarflakstad Oct 2, 2025
e3fd0da
bump gemm dependency to 0.18.2 to match ug
slckl Oct 2, 2025
0ad167d
Merge pull request #3100 from huggingface/improve-metal-command-buffe…
ivarflakstad Oct 3, 2025
58811e8
Merge pull request #3105 from slckl/gemm-bump
ivarflakstad Oct 3, 2025
e677576
[Metal] Buffer improvements (#3093)
ivarflakstad Oct 3, 2025
a708b7a
Various quantization improvements. Direct copy. Verified block sizes.…
ivarflakstad Oct 3, 2025
742dfef
make cuda benches run again (#3111)
slckl Oct 4, 2025
9b476b2
Capture command buffer errors if they exist (#3106)
ivarflakstad Oct 4, 2025
716e126
[Metal] Improve wait_for_completed command buffers locking (#3107)
ivarflakstad Oct 4, 2025
671de1d
Skip unsupported quantized matmul tests for metal (#3115)
ivarflakstad Oct 5, 2025
bcc34bc
Fix beit on metal by adding additional affine implementations (#3116)
ivarflakstad Oct 6, 2025
a1350d6
Rough example of inlining model files into binary (#3104)
matthewhaynesonline Oct 7, 2025
ca35cf9
Where cond get_strided_index conditionally based on function constant…
ivarflakstad Oct 7, 2025
0374ff3
feat(stable-diffusion): add build_unet_sharded method (#3118)
hoodiecollin Oct 8, 2025
ad1da34
Fix metal get_function error (#3114)
ivarflakstad Oct 8, 2025
256c4e2
Quantization use debug_assert in hot paths (#3109)
ivarflakstad Oct 8, 2025
6fb56c3
Adding inference for GraniteMoeHybrid models from IBM (#3117)
atilag Oct 8, 2025
7b8f2b4
Fix failing `cuda` build (#3121)
LLukas22 Oct 9, 2025
cc967fc
feat: add metal_if_available method for graceful Metal fallback (#3041)
xavierforge Oct 9, 2025
bffa5e1
Fix metal quantized to_float calls (#3123)
ivarflakstad Oct 9, 2025
41fa5f1
Add more conv2d bench cases to candle-nn benches (#3131)
slckl Oct 13, 2025
9fe6232
Fix single file binary builder to only run when env var is set (#3126)
ivarflakstad Oct 13, 2025
f601fd8
Update modernbert.rs (#3010)
whitebox2 Oct 16, 2025
701205a
Update dependencies (#3135)
ivarflakstad Oct 16, 2025
1febb7b
Ensure output of Transpose is contiguous to prevent downstream MatMul…
kshitijl Oct 17, 2025
2bce4e5
In the BERT example: apply the attention mask from tokenization durin…
kshitijl Oct 18, 2025
a52f22f
Skip q8k and q8_1 tests on cuda (#3140)
ivarflakstad Oct 20, 2025
36b7517
Implement qwen3 vl
EricLBuehler Oct 23, 2025
fd379c5
Clippy
EricLBuehler Oct 23, 2025
59aeed4
Bump candle version to 0.9.2-alpha.1 (#3146)
ivarflakstad Oct 23, 2025
5b7858c
Remove unused
EricLBuehler Oct 23, 2025
e3228c1
Add Qwen 3 VL to candle-transformers
EricLBuehler Oct 23, 2025
d312da2
Improve candle example buildtime downloader (#3147)
ivarflakstad Oct 23, 2025
a23a48f
CPU Conv2d: separate module, tiled im2col, specialization (#3136)
slckl Oct 25, 2025
31d6698
rust-ci: add --benches to clippy, fix warnings (#3148)
slckl Oct 25, 2025
df618f8
candle-core: add `broadcast_add` benches (#3149)
slckl Oct 25, 2025
fab0c45
fix: build errors for compute cap 7.5 (#3142)
neksodebe Oct 28, 2025
a05b549
Update cargo build instructions to use double colon syntax (#3132)
matthewhaynesonline Oct 28, 2025
8f27f5c
Add flash attn v3: `candle-flash-attn-v3` (#3152)
EricLBuehler Oct 28, 2025
7669ed1
Add nccl feature to candle-core (#3155)
EricLBuehler Oct 30, 2025
3c7a63d
clippy default fixes (#3160)
ivarflakstad Oct 31, 2025
b8c2ee8
Fix Metal matmul failure in `ModernBertHead::forward` by ensuring con…
whitebox2 Oct 31, 2025
ca3aee8
Add varbuilder get_unchecked methods (#3157)
EricLBuehler Oct 31, 2025
d4545eb
Add unsafe from_storage apis (#3156)
EricLBuehler Nov 1, 2025
26af167
Upstream merge Nov 1, 2025
lukekim Nov 1, 2025
4cc94c2
Formatting fixes
lukekim Nov 1, 2025
b06a02c
[Metal] Ensure metal backend is send/sync via status semaphore (#3164)
ivarflakstad Nov 6, 2025
ade0918
Add sqrt2 as constant for gelu_erf and use `libm` erf (#3168)
vrdn-23 Nov 7, 2025
4ff99ba
candle-core: strided-index inline next + size_hint + exact size itera…
slckl Nov 8, 2025
836540f
Fix DINOv2 no-interpolation shortcut (#3172)
pcuenca Nov 8, 2025
bf3d3f2
Use Tensor::argmax instead of manual cpu impl (#3173)
ivarflakstad Nov 9, 2025
87653ca
Fix argmax. Higher index should also be taken into account (#3179)
ivarflakstad Nov 11, 2025
db08cc0
Add command buffer pool for improved multi-threaded Metal performance…
anonenity Nov 11, 2025
60252cc
feat(candle-nn) ConcatKvCache for 2-5x GPU speedup on autoregressive …
DrJesseGlass Nov 14, 2025
2ae7bbe
Merge remote-tracking branch 'upstream/main' into lukim/upgrade-candle
lukekim Nov 15, 2025
feca3fc
Merge remote-tracking branch 'origin/main' into lukim/upgrade-candle
lukekim Nov 15, 2025
8ebfc22
Add `cublas_handle` api, update safetensors (#3192)
EricLBuehler Nov 17, 2025
ab56dfe
Update CI (#3194)
ivarflakstad Nov 17, 2025
549eacb
Add initial support for imatrix quantization (#3193)
EricLBuehler Nov 18, 2025
eb651c8
add clear kv cache to quantized qwen3 weights (#3189)
anonenity Nov 18, 2025
3390caa
fix typo preventing usage on mac (#3201)
amritsingh183 Nov 20, 2025
27cd43c
CUDA: Fix integer reductions by removing +/-INF initialization (#3200)
TimmyOVO Nov 20, 2025
9ca71de
fix for https://github.com/huggingface/candle/issues/3203 (#3204)
amritsingh183 Nov 20, 2025
b801ef6
Add lld installation and test steps for Linux (#3213)
haricot Nov 25, 2025
01bea21
Add dummy dtypes (#3195)
EricLBuehler Nov 25, 2025
95ea453
Add more misc. changes from candle fork (#3196)
EricLBuehler Nov 25, 2025
2ac3fe0
.gitignore: add .zed to ignored editor configs (#3218)
slckl Nov 30, 2025
c39d5f0
chore(dep): bump cudarc to 0.18.1 (#3219)
mayocream Dec 2, 2025
08d7b64
Hotfix: Bump float8 to 0.5.0 (#3223)
EricLBuehler Dec 3, 2025
2664a21
[Metal] Make fast math mode optional (#3205)
ivarflakstad Dec 4, 2025
9ede204
Update pyo3 (#3202)
ivarflakstad Dec 4, 2025
3d3cc49
[Metal] unary and affine improvements (#3230)
ivarflakstad Dec 6, 2025
72238a7
[Metal] binary improvements (#3231)
ivarflakstad Dec 8, 2025
d91be02
fix(metal): add missing softcapping field to AttnParams struct (#3233)
amritsingh183 Dec 8, 2025
2a797ea
Format sdpa (#3235)
EricLBuehler Dec 8, 2025
d23664f
Fix metal argmax (#3238)
EricLBuehler Dec 9, 2025
73fd9c3
[Metal] further improve unary and binary (#3239)
ivarflakstad Dec 10, 2025
e33d776
[Metal] cast improvements (#3241)
ivarflakstad Dec 10, 2025
4b46187
[Metal] Improve ternary further (#3242)
ivarflakstad Dec 14, 2025
8839457
Bump candle version to 0.9.2-alpha.2 (#3248)
ivarflakstad Dec 16, 2025
689d255
add candle flash attention 3 copyright markers (#3256)
michaelfeil Dec 21, 2025
590d2ad
WIP
lukekim Dec 22, 2025
4d42f63
Merge remote-tracking branch 'origin/main' into lukim/upgrade-candle
lukekim Dec 23, 2025
beeadf9
Formatting
lukekim Dec 23, 2025
0710642
Fixes
lukekim Dec 23, 2025
ceec35c
Formatting
lukekim Dec 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
rustflags = ["-C", "target-cpu=native"]

[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+simd128"]
rustflags = ["-C", "target-feature=+simd128", "--cfg", 'getrandom_backend="wasm_js"']

[target.x86_64-apple-darwin]
rustflags = ["-C", "target-feature=-avx,-avx2"]
40 changes: 0 additions & 40 deletions .github/workflows/book-cd.yml

This file was deleted.

29 changes: 0 additions & 29 deletions .github/workflows/book.yml

This file was deleted.

13 changes: 7 additions & 6 deletions .github/workflows/ci_cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,9 @@ jobs:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on:
group: aws-g4dn-2xlarge
group: aws-g5-4xlarge-cache
container:
image: nvidia/cuda:12.3.1-devel-ubuntu22.04
options: --gpus 0
image: nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04
if: ${{ github.event.pull_request.head.repo.full_name == github.event.pull_request.base.repo.full_name }}
permissions:
contents: write
Expand All @@ -22,13 +21,15 @@ jobs:
# with sigstore/fulcio when running outside of PRs.
id-token: write
security-events: write
env:
CUDA_COMPUTE_CAP: 86
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v5
- name: Install dependencies
run: apt-get update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
run: apt update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
- name: Install Rust Stable
uses: actions-rust-lang/setup-rust-toolchain@v1
uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Test (cuda)
run: cargo test --features cuda
Binary file modified .github/workflows/maturin.yml
Binary file not shown.
16 changes: 7 additions & 9 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,28 @@ jobs:
os: [ubuntu-latest] # For now, only test on Linux
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
uses: dtolnay/rust-toolchain@stable

- name: Install Python
uses: actions/setup-python@v4
uses: actions/setup-python@v6
with:
python-version: 3.11
python-version: 3.13
architecture: "x64"

- name: Cache Cargo Registry
uses: actions/cache@v1
uses: actions/cache@v4
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Install Protoc
uses: arduino/setup-protoc@v2
with:
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}

- name: Install
working-directory: ./candle-pyo3
Expand Down
80 changes: 41 additions & 39 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,66 +13,68 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v5
- uses: actions/setup-python@v6
with:
command: check
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable
- run: cargo check --workspace

test:
name: Test Suite
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- name: Free disk space (Linux)
if: runner.os == 'Linux'
run: |
sudo rm -rf /opt/hostedtoolcache
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
df -h
- uses: actions/checkout@v5
- uses: actions/setup-python@v6
with:
command: test
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable
- name: Install lld (Linux only)
if: runner.os == 'Linux'
run: sudo apt-get update && sudo apt-get install -y lld
- name: Run tests (with lld on Linux)
if: runner.os == 'Linux'
env:
RUSTFLAGS: "-C link-arg=-fuse-ld=lld"
run: cargo test --workspace
- name: Run tests (Windows & macOS)
if: runner.os != 'Linux'
run: cargo test --workspace

fmt:
name: Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v5
- uses: dtolnay/rust-toolchain@stable
with:
command: fmt
args: --all -- --check
components: rustfmt
- run: cargo fmt --all -- --check

clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add clippy
- uses: actions-rs/cargo@v1
- uses: dtolnay/rust-toolchain@stable
with:
command: clippy
args: --workspace --tests --examples -- -D warnings
components: clippy
- run: cargo clippy --workspace --tests --examples --benches -- -D warnings
12 changes: 6 additions & 6 deletions .github/workflows/trufflehog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
- name: Checkout code
uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Cargo.lock
# editor config
.helix
.vscode
.zed

# These are backup files generated by rustfmt
**/*.rs.bk
Expand Down
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,5 @@
path = candle-flash-attn/cutlass
url = https://github.com/NVIDIA/cutlass.git
[submodule "candle-flash-attn-v3/cutlass"]
url = https://github.com/NVIDIA/cutlass.git
path = candle-flash-attn-v3/cutlass
[submodule "candle-flash-mla/cutlass"]
path = candle-flash-mla/cutlass
url = https://github.com/NVIDIA/cutlass
Loading
Loading