Skip to content

Commit 52f3986

Browse files
authored
fix num_tokens in JsonlDataset (InternLM#1567)
1 parent 0404287 commit 52f3986

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

xtuner/v1/datasets/jsonl.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,8 @@ def __init__(
400400
else:
401401
self.sampled.extend(random.sample(_sampled, _target_num_samples - len(self.sampled)))
402402

403+
if num_tokens is not None:
404+
num_tokens = num_tokens[self.sampled]
403405
self.num_tokens = num_tokens
404406
self.offsets = offsets[self.sampled]
405407

0 commit comments

Comments
 (0)