-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the preprocess function, line
LongChat/longchat/train/fine_tune/train.py
Line 125 in a824bda
| cur_len = 1 |
and line:
LongChat/longchat/train/fine_tune/train.py
Line 137 in a824bda
| target[cur_len : cur_len + instruction_len] = IGNORE_TOKEN_ID |
will set
target to: [1, -100, -100, ...], with the first element is not ignored. I think Fastchat gives the correct code, which is first setting target[:cur_len] = IGNORE_TOKEN_ID so the target will be [-100, -100, -100, ...]. Am I right?@DachengLi1
Metadata
Metadata
Assignees
Labels
No labels