We still don't use BFDOT on macOS CPU because Apple compiler used for PyTorch wheel is outdated #1444
Open
Description
🚀 The feature, motivation and pitch
See pytorch/pytorch#143913 . This is blocking improved CPU performance for bfloat16 decoding on Mac; setting up an issue on torchchat side to track.
Alternatives
robust setup for decoding using accelerators on Mac
Additional context
No response
RFC (Optional)
No response