SPMD Global Batch size vs. --per_device_train_batch_size

## ❓ Questions and Help

Hey all,

Am looking to solidify my understanding and seeking a clarification on the SPMD user guide: https://github.com/pytorch-tpu/transformers/blob/llama2-google-next-training/SPMD_USER_GUIDE.md  

I see it says:

 _global_batch_size: The global batch size to use. Note that this value is supplied to the per_device_train_batch_size flag, since  currently HuggingFace treats SPMD as a single-device program. This will change in future releases._

I'd like to ask 2 questions here, to ensure my understanding is correct:

1) With respect to the blog https://pytorch.org/blog/high-performance-llama-2/ and Figure 2, where it says, notably for the V4-32 use-case: "per device batch" = 16, Global Batch = 256, what was the argument to run_clm.py ? Was it 
--per_device_train_batch_size 256  ?

If it was indeed "--per_device_train_batch_size 256 " , is the "Per Device Batch" in Figure 2 just a simple calculation of 256/16 TPUv4-32 chips, and NOT an actual argument to run_clm.py ?


2) Related, am looking to understand what (future release) project is tracking a refinement of how Global Batch Size is specified for a multi-device configuration ?


Many thanks,
Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPMD Global Batch size vs. --per_device_train_batch_size #6411

❓ Questions and Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development