Open
Description
System Info
latest releases of transformers, bnb, peft, accelerate, python 2.5.1, deepspeed 0.16.1
Reproduction
see axolotl config here: https://wandb.ai/axolotl-ai/lora-3b-ds-zero3/runs/c4b1agng/files/tmp/axolotl_config_az8clerk.yml
Expected behavior
the loss value is off by an order of magnitude @ ~13, whereas zero2 and zero1 are correct. I also tried changing the llm_int8_threshold
in the bnb config to 0.0 and 1.0. 0.0 results in 0.0 loss, and 1.0 results in the same original defect.