Open
Description
🐛 Describe the bug
Hi maintainers,
I find that, Torchchat uses MATH as SDPA backend in https://github.com/pytorch/torchchat/blob/main/torchchat/generate.py#L542. However, for other libs like vllm, they all accept flash attention as default backend.
So why Torchchat uses MATH as a default backend? Is this required for accuracy? If not, I can help to add an argument to let user set the backend. Thanks!