Before monkey patch:
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 1024)
(layers): ModuleList(
(0-3): 4 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=1024, out_features=1024, bias=False)
(k_proj): Linear(in_features=1024, out_features=256, bias=False)
(v_proj): Linear(in_features=1024, out_features=256, bias=False)
(o_proj): Linear(in_features=1024, out_features=1024, bias=False)
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=1024, out_features=2048, bias=False)
(up_proj): Linear(in_features=1024, out_features=2048, bias=False)
(down_proj): Linear(in_features=2048, out_features=1024, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((1024,), eps=1e-06)
(post_attention_layernorm): LlamaRMSNorm((1024,), eps=1e-06)
)
)
(norm): LlamaRMSNorm((1024,), eps=1e-06)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=1024, out_features=32000, bias=False)
)
===============================================
After monkey patch
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 1024)
(layers): ModuleList(
(0-3): 4 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=1024, out_features=1024, bias=False)
(k_proj): Linear(in_features=1024, out_features=256, bias=False)
(v_proj): Linear(in_features=1024, out_features=256, bias=False)
(o_proj): Linear(in_features=1024, out_features=1024, bias=False)
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=1024, out_features=2048, bias=False)
(up_proj): Linear(in_features=1024, out_features=2048, bias=False)
(down_proj): Linear(in_features=2048, out_features=1024, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((1024,), eps=1e-06, offset=0.0, in_place=True)
(post_attention_layernorm): LlamaRMSNorm((1024,), eps=1e-06, offset=0.0, in_place=True)
)
)
(norm): LlamaRMSNorm((1024,), eps=1e-06, offset=0.0, in_place=True)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=1024, out_features=32000, bias=False)
)
🐛 Describe the bug
In #524, @jp1924 found that the model doesn't show liger module names if monkey patch is applied to the model instance. It is because the current implementation is only binding Liger's
forwardandextra_reprmethods to the instance without touching other methods.Note that this bug doesn't affect the model training, but addressing it can be helpful for debugging.
See: torch.nn.Module, torch.nn.Module.__repr__, liger monkey patch
Reproduce
output
Versions
Operating System: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.10.12
Liger Kernel version: 0.5.4
PyTorch version: 2.5.1+cu124
CUDA version: 12.4
HIP(ROCm) version: Not available
Triton version: 3.1.0
Transformers version: 4.49.0
XPU version: XPU Not Available