Skip to content

Zero-shot load_${task}_data方法中 prompt_tokens的最后一位是否应该mask? #41

@uk9921

Description

@uk9921
prompt = "这是关于{}的文章:".format(label)
prompt_tokens = tokenizer.encode(prompt)
prompt_len = len(prompt_tokens)
...
second_mask = [0] * (args.seq_length - 1)
for idx in range(prompt_len - 1, len(tokens) - 1):
  second_mask[idx] = 1

prompt_tokens最后一位应该是冒号‘:’,second_mask[prompt_len - 1]是否应该设置为0?

下面是一些pdb的打印参考结果

(Pdb) p prompt
'这是关于news_story的文章:'
(Pdb) p prompt_tokens
[621, 671, 14464, 555, 27743, 11, 1630, 8, 17]
(Pdb) p prompt_len - 1
8
(Pdb) p prompt_tokens[8]
17
(Pdb) p tokenizer.decode(17)
':'
(Pdb) p second_mask[8]
1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions