Skip to content

Conversation

@AlpineVibrations
Copy link

Add recent HF Candle changes

EricLBuehler and others added 6 commits July 21, 2025 06:01
* F8E4M3 dtype

* Add test

* Fix

* Add i16/i32

* Add f6e2m3, f6e3m2, f4, f8e8m0

* Add f6e2m3, f6e3m2, f4, f8e8m0 ST loading

* Compiles on metal

* Clippy and format

* Fix cuda

* Cuda build fixes

* Use float8 0.3.0

* Remove

* Updated quantized api

* Updated varbuilder api

* Metal updates

* Export dummy dtype

* Better error type

* Add unfold

* Add new_buffer_private for metal

* Add empty

* Add get_current_seed

* Add get_current_seed

* Add flash attn v3

* Add flash attn v3

* Update deps

* Updated v3 FA

* Fix cuda build

* Support loading new dtypes

* Fix cuda

* Fix cpu

* Add i32 cuda dtype

* Add i32 cuda dtype

* Expose cublas handle

* Fix flash attn v2

* Add cutlass

* Add cutlass

* Fix v3 build
…ricLBuehler#94)

* Fix extra-long context kernel launch issue for indexed moe forward

* Fix all entries for input quant with quantize_q8_1
@ivarflakstad
Copy link
Member

There's a lot of stuff here that we would want to get into candle, but unfortunately I think we would have to do it the boring way where we split this up into several separate PRs.

@AlpineVibrations
Copy link
Author

yeah makes sense at this point. EricLBuehler has a fork on the dev_main branch with some features like FlashAttn v3 and maybe others..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants