Skip to content

Scaling up Batch Size and GPU Usage to Accelerate Training #249

@yoyowang0109

Description

@yoyowang0109

Hi, and thank you for your support so far.

After setting tokens_per_batch to 8192, the training runs smoothly. However, based on the source code, it seems that each batch is constrained to a size of 1, which also implies that training is limited to using only a single GPU — otherwise, the following assertion error is triggered:

Image

This restriction results in a large number of batches being processed. For example, pre-training on the MIMIC-IV dataset currently takes approximately 42–45 days to complete.

Given that Appendix C of your paper mentions using “24 Intel Xeon 2.70GHz CPU cores and 8 Nvidia V100 GPUs,” I’m wondering:

  1. Does the current version of the code support multi-GPU training and larger batch sizes?
  2. If not directly, are there recommended changes or workarounds to enable this?

Any guidance or suggestions would be greatly appreciated.
Thank you again for your time and work on this project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions