-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Hi, and thank you for your support so far.
After setting tokens_per_batch to 8192, the training runs smoothly. However, based on the source code, it seems that each batch is constrained to a size of 1, which also implies that training is limited to using only a single GPU — otherwise, the following assertion error is triggered:
This restriction results in a large number of batches being processed. For example, pre-training on the MIMIC-IV dataset currently takes approximately 42–45 days to complete.
Given that Appendix C of your paper mentions using “24 Intel Xeon 2.70GHz CPU cores and 8 Nvidia V100 GPUs,” I’m wondering:
- Does the current version of the code support multi-GPU training and larger batch sizes?
- If not directly, are there recommended changes or workarounds to enable this?
Any guidance or suggestions would be greatly appreciated.
Thank you again for your time and work on this project.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
