Skip to content

Maybe a bug in the preprocess? #26

@Richar-Du

Description

@Richar-Du

Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the preprocess function, line


and line:
target[cur_len : cur_len + instruction_len] = IGNORE_TOKEN_ID

will set target to: [1, -100, -100, ...], with the first element is not ignored. I think Fastchat gives the correct code, which is first setting target[:cur_len] = IGNORE_TOKEN_ID so the target will be [-100, -100, -100, ...]. Am I right?
@DachengLi1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions