Skip to content

Commit 48a1edb

Browse files
Revert naming change.
1 parent e53b794 commit 48a1edb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

open_lm/datapreprocess/ray/tokenize_shuffle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ def _flush_buffer(self, folder, counter):
234234
tokens = [int(x) for x in self.buffer[i]["tokens"]]
235235
token_count += len(tokens)
236236
json_string = json.dumps(tokens)
237-
uid = f"{tar_index_str}_{i:0{digits}}"
237+
uid = hashlib.md5(json_string.encode()).hexdigest()
238238
sample = {"__key__": uid, "json.gz": json_string}
239239
sink.write(sample)
240240
bio.seek(0)

0 commit comments

Comments
 (0)