Skip to content

How can I measure memory profile with tools? #95

@robotsp

Description

@robotsp

May I know how to get the real activation memory with each module of the model, for example mlp, attention, moe.
Could you recommend some profiling tools or torch.cuda.memory api?

Thanks, @ISEEKYAN

input_shape=[1, 4096] Number of parameters in every GPU in billions: 1.25 Number of activation in every GPU in billions: 1.31 num_bytes_per_parameter=7.5 GPTModel /* n_params=1,250,961,408 n_act=1,308,100,608 */ ( (embedding): LanguageModelEmbedding /* n_params=65,536,000 n_act=8,390,656 */ ( (word_embeddings): VocabParallelEmbedding /* n_params=65,536,000 n_act=2,048 */ () (embedding_dropout): Dropout /* n_params=0 n_act=8,388,608 */ () ) (decoder): TransformerBlock /* n_params=1,119,889,408 n_act=1,168,637,952 */ ( (layers): ModuleList /* n_params=1,119,887,360 n_act=1,160,249,344 */ ( (0-3): 4 x TransformerLayer /* n_params=279,971,840 n_act=290,062,336 */ ( (input_layernorm): IdentityOp /* n_params=0 n_act=0 */ () (self_attention): SelfAttention /* n_params=12,582,912 n_act=25,296,896 */ ( (core_attention): TEDotProductAttention /* n_params=0 n_act=131,072 */ () (linear_qkv): ColumnParallelLinear /* n_params=8,388,608 n_act=16,777,216 */ () (q_layernorm): IdentityOp /* n_params=0 n_act=0 */ () (k_layernorm): IdentityOp /* n_params=0 n_act=0 */ () (linear_proj): RowParallelLinear /* n_params=4,194,304 n_act=8,388,608 */ () ) (pre_cross_attn_layernorm): IdentityOp /* n_params=0 n_act=0 */ () (cross_attention): IdentityOp /* n_params=0 n_act=0 */ () (cross_attn_bda): IdentityOp /* n_params=0 n_act=0 */ () (pre_mlp_layernorm): RMSNorm /* n_params=2,048 n_act=8,388,608 */ () (mlp): MoELayer /* n_params=267,386,880 n_act=256,376,832 */ ( (router): TopKRouter /* n_params=0 n_act=16,777,216 */ () (experts): SequentialMLP /* n_params=267,386,880 n_act=111,411,200 */ ( (local_experts): ModuleList /* n_params=267,386,880 n_act=0 */ ( (0-7): 8 x MLP /* n_params=33,423,360 n_act=13,926,400 */ ( (linear_fc1): ColumnParallelLinear /* n_params=22,282,240 n_act=5,570,560 */ () (linear_fc2): RowParallelLinear /* n_params=11,141,120 n_act=2,785,280 */ () ) ) ) ) ) ) (final_layernorm): RMSNorm /* n_params=2,048 n_act=8,388,608 */ () ) (output_layer): ColumnParallelLinear /* n_params=65,536,000 n_act=131,072,000 */ () ) Theoretical memory footprints: weight and optimizer=8947.57 MB, activation=2495.00 MB, total=11442.58 MB Theoretical memory footprints: weight and optimizer=8.74 GB, activation=2.44 GB, total=11.17 GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions