-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hello, this is an excellent piece of work. I am using four 24GB 4090 GPUs to perform SFT on qwen1.5-0.5B, and I noticed that the loss remains at 0.000000. Is this reasonable?
{'loss': '0.000000', 'learning_rate': 1.9998178635346865e-05, 'epoch': 0.01}
{'loss': '0.000000', 'learning_rate': 1.9992715204861295e-05, 'epoch': 0.01}
{'loss': '0.000000', 'learning_rate': 1.9983611698723126e-05, 'epoch': 0.02}
{'loss': '0.000000', 'learning_rate': 1.9970871433093214e-05, 'epoch': 0.02}
{'loss': '0.000000', 'learning_rate': 1.9954499048905464e-05, 'epoch': 0.03}
{'loss': '0.000000', 'learning_rate': 1.9934500510176242e-05, 'epoch': 0.04}
Additionally, could you help clarify whether it is reasonable that 4x24GB of GPU memory is not enough to fine-tune a 3B model with LoRA? (even batchsize=1)
Looking forward to your reply, and thank you in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels