Skip to content

SPMD Global Batch size vs. --per_device_train_batch_size #6411

Open
@isaacr

Description

❓ Questions and Help

Hey all,

Am looking to solidify my understanding and seeking a clarification on the SPMD user guide: https://github.com/pytorch-tpu/transformers/blob/llama2-google-next-training/SPMD_USER_GUIDE.md

I see it says:

global_batch_size: The global batch size to use. Note that this value is supplied to the per_device_train_batch_size flag, since currently HuggingFace treats SPMD as a single-device program. This will change in future releases.

I'd like to ask 2 questions here, to ensure my understanding is correct:

  1. With respect to the blog https://pytorch.org/blog/high-performance-llama-2/ and Figure 2, where it says, notably for the V4-32 use-case: "per device batch" = 16, Global Batch = 256, what was the argument to run_clm.py ? Was it
    --per_device_train_batch_size 256 ?

If it was indeed "--per_device_train_batch_size 256 " , is the "Per Device Batch" in Figure 2 just a simple calculation of 256/16 TPUv4-32 chips, and NOT an actual argument to run_clm.py ?

  1. Related, am looking to understand what (future release) project is tracking a refinement of how Global Batch Size is specified for a multi-device configuration ?

Many thanks,
Isaac

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions