v0.6.0: Paper Release, GaLore and FSDP+QLoRA #2969
hiyouga
announced in
Announcements
Replies: 1 comment 1 reply
-
|
hello @hiyouga just read the paper on training details D.1 what is the input length on pre traininng step ? could find any
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
We released our paper on arXiv! Thanks to all co-authors and AK's recommendation
New features
--infer_backend vllmapply_chat_templateby adding a chat template to the tokenizer after fine-tuningNew models
New datasets
Bug fix
offload_dirto dispatch this model according to thisdevice_map#2802 addpip install fireto requirements.txt #2803 使用webui,执行eval&pre的预览命令或开始后报错 KeyError: dropdown #2817 LoRA+ with DeepSpeed #2895 使用lora微调时,同时训练了一些层的参数,合并验证报错 #2928 AttributeError: 'torch.dtype' object has no attribute 'itemsize' #2936 LongLoRA Issue: sft加入--shift_attn 报错 (sft with--shift_attn giving errors) #2941This discussion was created from the release v0.6.0: Paper Release, GaLore and FSDP+QLoRA.
Beta Was this translation helpful? Give feedback.
All reactions