Open
Description
Feature request
Hi thanks for the library! It would be great if the optimizers can be run on CPU. For example, I would like to try adamw_8bit to full-finetune a 8B model on a 24GB GPU card (RTX4090). With deepspeed offload, the GPU memory is OK, but the CPU memory requirement is still very huge, partially because it uses normal adamw, thus needs 8x8=64GB for the optimizer itself.
This package creates the super helpful adamw_8bit, thus I would appreciate it if it can be used with the settings above, hopefully reducing 64GB to 8x2=16GB for optimizer state.
Motivation
(see above)
Your contribution
Yes