Matching the response tokens

Hello, thanks for the nice reference code! I noticed the following code tries to match the response tokens, but it might match the instruction tokens instead

https://github.com/databrickslabs/dolly/blob/aaa0ecb5a5555f99e57e6582f1fb3d289f31940f/training/trainer.py#L60-L63

This is because it breaks when the first token matches, but `'### Response:\n'` is encoded with `[21017, 18261, 25, 198]`., but it matches `### Instruction:\n` (`[21017, 46486, 25, 198]`) instead.

To resolve the issue and if it is indeed that you intended to match the response tokens, you should consider the following snippet instead :)

```
            for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]:
                # `response_token_ids` is `'### Response:\n'`, here we are just making sure that the token IDs match
                if response_token_ids == examples[i]["input_ids"][idx:idx+len(response_token_ids)]:
                    response_token_ids_start_idx = idx  
```

Our related issue https://github.com/lvwerra/trl/pull/445#issuecomment-1595331363

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Matching the response tokens #197

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	response_token_ids_start_idx = None
	for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]:
	response_token_ids_start_idx = idx
	break

Uh oh!

Matching the response tokens #197

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions