-
Notifications
You must be signed in to change notification settings - Fork 24
Big Dataset Examples #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Big Dataset Examples #163
Conversation
|
I actually failed to understand how HF can allow the use of a custom download of the dataset the pile yet but I plan to add another example with that dataset |
|
Wouldn't we want to extract the archives into SLURM_TMPDIR in the |
|
Yes I was also thinking about that and the currently strategy in |
ab3e057 to
f861529
Compare
dd09537 to
41094f9
Compare
|
Waiting for merge of #161. |
94be372 to
1572fd8
Compare
docs/Minimal_examples.rst
Outdated
| .. include:: examples/frameworks/index.rst | ||
| .. include:: examples/distributed/index.rst | ||
| .. include:: examples/data/index.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might fit nicely in good_practices, what do you think?
|
|
||
| **job.sh** | ||
|
|
||
| .. literalinclude:: examples/data/torchvision/job.sh.diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .. literalinclude:: examples/data/torchvision/job.sh.diff | |
| .. literalinclude:: job.sh.diff |
|
|
||
| **main.py** | ||
|
|
||
| .. literalinclude:: examples/data/torchvision/main.py.diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .. literalinclude:: examples/data/torchvision/main.py.diff | |
| .. literalinclude:: main.py.diff |
|
|
||
| **data.py** | ||
|
|
||
| .. literalinclude:: examples/data/torchvision/data.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .. literalinclude:: examples/data/torchvision/data.py | |
| .. literalinclude:: data.py |
3792d9d to
c159806
Compare
|
@lebrice did you had time to check the recent updates to this PR? |
Not fully, but a glance, my comment here doesnt seem to have been addressed: #163 (comment) Edit: Okay I've looked at it now, my previous comments about the content are still relevant (for the most part). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, same comment (third time I make it): #163 (comment)
Let me know what you think.
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
Co-authored-by: Fabrice Normandin <[email protected]>
5744f32 to
e40566a
Compare
|
So I think the only issues remaining were the |
e40566a to
2e81ced
Compare
2e81ced to
c71bfc7
Compare
|
Let me clarify the comment #163 (comment) : What I'm saying is that I don't really see the value in having the main.py file included in this example, or showing a diff with respect to the single-gpu job's main.py (you did address this part by removing the diff, thanks!). In my opinion, the main "body" of the example is data.py, and showing how to use What do you think? |
|
To be clear, if you feel like you want to merge this, then sure, it's fine as-is. I was just hoping that perhaps we could re-focus the example a bit so it doesn't dilute or mix up the important part of the content with what's already in the GPU job example. One other thing: Why do we allow customizing the number of workers for data preparation? Is there a context in which we don't want to use the number of data preparation workers = number of cpus per node? |
|
Nah not on the cluster, people will use all CPUs available, this is mostly a left over from the scripts I'm personally using to preprocess datasets (at least the bash version). We're also showing a very good practice which is to not override environnement variables if they exists but I'm ok with removing it. I agree for the |
No description provided.