Add more misc. changes from candle fork #3196

EricLBuehler · 2025-11-17T20:10:11Z

indexed_moe_forward (fast path for ggml quants)
Improved usability of Context
Add full attn support for Metal SDPA
Fix bug w/ FlashAttn f16
Add necessary metal Device apis

ivarflakstad

Excellent stuff

ivarflakstad · 2025-11-19T10:03:28Z

candle-core/src/metal_backend/device.rs

+    /// Creates a new private buffer (not necessarily zeroed).
+    ///
+    /// This is intentionally not in the Metal buffer pool to allow the efficient implementation of persistent buffers.
+    pub fn new_private_buffer(


I agree that this is nice to have, but I think we should name it something other than private buffer since that already means something for metal buffers (only available on gpu, ref).
We don't want to use actual metal private buffers as that isn't supported on iOS.

How about new_unpooled_buffer or new_persistent_buffer? :)

Actually, this was a mistake on my part. The correct behavior that I intended for this function is to have:

private if not on iOS

shared/RESOURCE_OPTIONS if on iOS

I see. Could I ask why you want it to be private?
According to Apple's documentation there is no performance benefit, so private is usually used when you want to ensure that the cpu does not have access to the buffer for some specific reason. I'd wager a guess this kind of behaviour is frequently used in gaming.

ivarflakstad · 2025-11-19T10:08:49Z

candle-core/src/quantized/cuda.rs

+            crate::bail!(
+                "The given quantized dtype {:?} is not supported for indexed_moe_forward!",
+                self.dtype()
+            );


Just thinking out loud here. It would be nice to have automatic fallback to an approach that isn't as optimized, but still valid. Perhaps returning Result<Option<(CudaStorage, crate::Shape)>> is a decent starting point?
If None then fallback?

Not thinking we add this in this PR ofc.

This might work, the issue is that effectively indexed_moe_forward is a grouped gemm so we'd need existing infrastructure to run a grouped gemm.

Regardless, providing a grouped gemm functionality will be very useful!

candle-core/src/error.rs

candle-core/src/tensor.rs

candle-flash-attn/src/lib.rs

candle-kernels/src/quantized.cu

candle-metal-kernels/src/kernels/sdpa.rs

candle-metal-kernels/src/metal/device.rs

candle-core/src/error.rs

EricLBuehler · 2025-11-20T01:17:43Z

Addressed the review comments, the new_private_buffer method is now implemented correctly.

candle-core/src/metal_backend/device.rs

Co-authored-by Guoqing Bao <[email protected]>

* Update CI * I have no clue what was going on with this maturin file, but I don't like it * update cuda container options * Add compute cap to cuda wf * Fix rust toolchain call * update cuda ci runner and bindgen_cuda

EricLBuehler marked this pull request as ready for review November 17, 2025 22:53

ivarflakstad reviewed Nov 19, 2025

View reviewed changes

EricLBuehler requested a review from ivarflakstad November 20, 2025 01:16

ivarflakstad reviewed Nov 20, 2025

View reviewed changes

candle-core/src/metal_backend/device.rs Outdated Show resolved Hide resolved

EricLBuehler and others added 14 commits November 21, 2025 06:20

Merge with fork

178987a

Co-authored-by Guoqing Bao <[email protected]>

Update sdpa

d4dab0c

Fix flash attn bf16 case

0ee2bc8

Metal fixes

bc9030c

Add metal methods

fd2b563

Add new_private_buffer

00689f5

Fix metal tests

60e297a

Format

dc80e40

Apply review comments

15591ff

Update CI (#3194)

5d1dbd6

* Update CI * I have no clue what was going on with this maturin file, but I don't like it * update cuda container options * Add compute cap to cuda wf * Fix rust toolchain call * update cuda ci runner and bindgen_cuda

Add initial support for imatrix quantization (#3193)

a372a14

add clear kv cache to quantized qwen3 weights (#3189)

1bb1c93

Fix metal bug

cb4a042

Apply review comments

bdb66f2

EricLBuehler force-pushed the misc_fork_updates branch from 4c3f2be to bdb66f2 Compare November 21, 2025 11:22

EricLBuehler requested a review from ivarflakstad November 21, 2025 11:23

EricLBuehler added 2 commits November 21, 2025 06:24

Merge branch 'main' into misc_fork_updates

2536e75

Fix merge

d21b0a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more misc. changes from candle fork #3196

Add more misc. changes from candle fork #3196

Uh oh!

EricLBuehler commented Nov 17, 2025 •

edited

Loading

Uh oh!

ivarflakstad left a comment

Uh oh!

ivarflakstad Nov 19, 2025

Uh oh!

EricLBuehler Nov 20, 2025 •

edited

Loading

Uh oh!

ivarflakstad Nov 20, 2025 •

edited

Loading

Uh oh!

ivarflakstad Nov 19, 2025

Uh oh!

EricLBuehler Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add more misc. changes from candle fork #3196

Are you sure you want to change the base?

Add more misc. changes from candle fork #3196

Uh oh!

Conversation

EricLBuehler commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

ivarflakstad Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

EricLBuehler Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivarflakstad Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivarflakstad Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

EricLBuehler Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EricLBuehler commented Nov 17, 2025 •

edited

Loading

EricLBuehler Nov 20, 2025 •

edited

Loading

ivarflakstad Nov 20, 2025 •

edited

Loading