Description
Hi, when running the following command:
tune run lora_finetune_single_device --config llama3/8B_lora_single_device model.lora_rank=16 optimizer=bitsandbytes.optim.AdamW8bit gradient_accumulation_steps=4 tokenizer.max_seq_len=2048 max_steps_per_epoch=100 model.lora_attn_modules="['q_proj','k_proj','v_proj','output_proj']" model.apply_lora_to_mlp=True log_peak_memory_stats=True compile=True checkpointer.checkpoint_dir=checkpoints/original tokenizer.path=checkpoints/original/tokenizer.model checkpointer.output_dir=checkpoints/original
Which returns this stack trace.
It looks like we unconditionally pass fused
as a kwarg to the optimizer even though the bits and bytes optimizer doesn't have this kwarg
Related issue:#1998
Version info:
Pytorch: 1b3f8b75896720e88362cbec7db32abc52afa83e
Torchtune: f2bd4bc
Torchao: 039cef4ad546716aa04cd54c461feb173f7fe403