Open
Description
Default no GPU usage, that's the moderate way I thought to temporarily face the situation that flux Clip_l loaded in CPU: We cannot load it in vram then at least speed it up.
But as the issue, that disable ROCm ability but only use OpenBLAS.
Why OpenBLAS?
Not like years ago, current OpenBLAS speed is much faster than avx2, and faster than Blis either. (About Blis Vs OpenBLAS, I only tested in arm.)
Metadata
Assignees
Labels
No labels