Skip to content

Conversation

@aleozlx
Copy link

@aleozlx aleozlx commented Dec 5, 2025

For #2845

Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.

For NVIDIA#2845

Added spin_lock_atom_cas_acquire_wait function to handle spin lock acquisition with atomic compare-and-swap.
@aleozlx
Copy link
Author

aleozlx commented Dec 5, 2025

This is functional. flashinfer-ai/flashinfer#2171

Raising it as a proposed solution for what we needed when upgrading to nvidia-cutlass-dsl 4.3.1 #2845

Kind regards from FlashInfer & cuDNN :)

@XiaoSong9905
Copy link
Member

XiaoSong9905 commented Dec 12, 2025

acquire wait is not needed. slack Xiao Song and we can schedule a meeting to explain this

@XiaoSong9905
Copy link
Member

the two shot all redue.py fail is related to something else, let's discuss this in the meeting

@shubaoyu2
Copy link
Contributor

shubaoyu2 commented Dec 12, 2025

you can use the new two-shot gemm+ar kernel in cutedsl examples. The one in flashinfer should be an old version.

adding something to CuTeDSL wheel package will take some time, so I would recommend you use the new kernel.

@aleozlx
Copy link
Author

aleozlx commented Dec 16, 2025

sounds good will discuss with you over slack. will learn about the new kernel example and bring action item back to FI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants