Open
Description
❓ Questions and Help
Hey all,
Am looking to solidify my understanding and seeking a clarification on the SPMD user guide: https://github.com/pytorch-tpu/transformers/blob/llama2-google-next-training/SPMD_USER_GUIDE.md
I see it says:
global_batch_size: The global batch size to use. Note that this value is supplied to the per_device_train_batch_size flag, since currently HuggingFace treats SPMD as a single-device program. This will change in future releases.
I'd like to ask 2 questions here, to ensure my understanding is correct:
- With respect to the blog https://pytorch.org/blog/high-performance-llama-2/ and Figure 2, where it says, notably for the V4-32 use-case: "per device batch" = 16, Global Batch = 256, what was the argument to run_clm.py ? Was it
--per_device_train_batch_size 256 ?
If it was indeed "--per_device_train_batch_size 256 " , is the "Per Device Batch" in Figure 2 just a simple calculation of 256/16 TPUv4-32 chips, and NOT an actual argument to run_clm.py ?
- Related, am looking to understand what (future release) project is tracking a refinement of how Global Batch Size is specified for a multi-device configuration ?
Many thanks,
Isaac