-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
I tried to load all documents for each data sample and got the following error:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'benchmarks/officeqa/treasury_bulletins_parsed/transformed_page_level/treasury_bulletin_1982_3.txt'For your reference, here is my Python code:
from datasets import load_dataset
dataset = load_dataset('csv', data_files='benchmarks/officeqa/officeqa.csv')['train']
import tiktoken
tiktoken_encoder = tiktoken.get_encoding("o200k_base")
count_tokens = lambda text: len(tiktoken_encoder.encode(text, disallowed_special=()))
context_lens = []
for datapoint in dataset:
context = [open(f"benchmarks/officeqa/treasury_bulletins_parsed/transformed_page_level/{f}", 'r').read()
for f in datapoint['source_files'].split('\r\n')]
context_len = 0
for doc in context:
context_len += count_tokens(doc)
context_lens.append(context_len)Same error occurs when I use documents in benchmarks/officeqa/treasury_bulletins_parsed/transformed instead.
P/s: It is the only missing document.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels