Skip to content

LoRA + deepspeed zero3 finetuing using 8bit quantization of base weights results in increased loss #1451

Open
@winglian

Description

@winglian

System Info

latest releases of transformers, bnb, peft, accelerate, python 2.5.1, deepspeed 0.16.1

Reproduction

see axolotl config here: https://wandb.ai/axolotl-ai/lora-3b-ds-zero3/runs/c4b1agng/files/tmp/axolotl_config_az8clerk.yml

Expected behavior

the loss value is off by an order of magnitude @ ~13, whereas zero2 and zero1 are correct. I also tried changing the llm_int8_threshold in the bnb config to 0.0 and 1.0. 0.0 results in 0.0 loss, and 1.0 results in the same original defect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontributions-welcomeWe welcome contributions to fix this issue!

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions