-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no main-memory cache to loaders #1624
base: main
Are you sure you want to change the base?
Conversation
# log only once, here: | ||
# log once for all splits, as they are limited the same | ||
if self.get_limit() is not None: | ||
self.log_limited_loading() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecassry as inside the log limited loading there is a mechanism to log only once. Which means it is logging only once and only when the data is actually loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the mechanism hides problem. Like it hid from you the loop in LoadCSV.
This logging is cleaner, and happens just once, and does not introduce new variables.
src/unitxt/loaders.py
Outdated
if isoftype(iterables, MultiStream): | ||
return iterables | ||
return MultiStream.from_iterables(iterables, copying=True) | ||
return MultiStream.from_iterables(iterables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of lazy loader we never reach that point, so for most of the loaders the copying=True is not affecting much, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we only reach this line. No Loader returns a MultiStream from its load_iterables.
ff9a435
to
f8775b9
Compare
…rue for LoadH, LoadCSV, and LoadSKLearn, which watch out and do not let anyone modify their loaded dataset Signed-off-by: dafnapension <[email protected]>
f8775b9
to
6b29d00
Compare
and no copying=true for the streams generated by the loader for loaders which watch out after the datasets they load - letting no one modify them. This is the case for LoadHF, LoadCSV, and LoadSKLearn. For the rest -- copying=True remains.
E.g that demonstrates how the HF dataset, returned by LoadHF, is not modifiable.