What is the proper way to convert the Llama-2 huggingface checkpoint format to the Megatron? I followed the instructions in the docs/llama2.md, but got the following errors. I don't understand why transformer_engine in core/transformer/custom_layers imports itself as te at line 6, and in that module, there is no attribute for pytorch.
MODEL_SIZE=7B
TP=1
TOP=/mnt
MEGATRON_DIR=$TOP/Megatron/Megatron-LM
HF_FORMAT_DIR=$TOP/LLaMa/llama_workarea/hf_llama_models/$MODEL_SIZE
MEGATRON_FORMAT_DIR=$TOP/Megatron/workspace.Megatron-LM/weights/$MODEL_SIZE
TOKENIZER_MODEL=$TOP/LLaMa/llama_workarea/hf_llama_models/7B/$MODEL_SIZE/tokenizer.model
export PYTHONPATH="$PWD:$PWD/tools/checkpoint"
echo $PYTHONPATH
python3 tools/checkpoint/util.py
--model-type GPT
--loader llama2_hf
--saver megatron
--target-tensor-parallel-size ${TP}
--load-dir ${HF_FORMAT_DIR}
--save-dir ${MEGATRON_FORMAT_DIR}
--tokenizer-model ${TOKENIZER_MODEL}
--
Loaded loader_llama2_hf as the loader.
Loaded saver_megatron as the saver.
Starting saver...
Starting loader...
Zarr-based strategies will not be registered because of missing packages
Zarr-based strategies will not be registered because of missing packages
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 15, in
from megatron.core.transformer.transformer_block import TransformerBlock
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/transformer_block.py", line 13, in
from megatron.core.transformer.custom_layers.transformer_engine import TENorm
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 71, in
class TELinear(te.pytorch.Linear):
AttributeError: module 'transformer_engine' has no attribute 'pytorch'
What is the proper way to convert the Llama-2 huggingface checkpoint format to the Megatron? I followed the instructions in the docs/llama2.md, but got the following errors. I don't understand why transformer_engine in core/transformer/custom_layers imports itself as te at line 6, and in that module, there is no attribute for pytorch.
MODEL_SIZE=7B
TP=1
TOP=/mnt
MEGATRON_DIR=$TOP/Megatron/Megatron-LM
HF_FORMAT_DIR=$TOP/LLaMa/llama_workarea/hf_llama_models/$MODEL_SIZE
MEGATRON_FORMAT_DIR=$TOP/Megatron/workspace.Megatron-LM/weights/$MODEL_SIZE
TOKENIZER_MODEL=$TOP/LLaMa/llama_workarea/hf_llama_models/7B/$MODEL_SIZE/tokenizer.model
export PYTHONPATH="$PWD:$PWD/tools/checkpoint"
echo $PYTHONPATH
python3 tools/checkpoint/util.py
--model-type GPT
--loader llama2_hf
--saver megatron
--target-tensor-parallel-size ${TP}
--load-dir ${HF_FORMAT_DIR}
--save-dir ${MEGATRON_FORMAT_DIR}
--tokenizer-model ${TOKENIZER_MODEL}
--
Loaded loader_llama2_hf as the loader.
Loaded saver_megatron as the saver.
Starting saver...
Starting loader...
Zarr-based strategies will not be registered because of missing packages
Zarr-based strategies will not be registered because of missing packages
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 15, in
from megatron.core.transformer.transformer_block import TransformerBlock
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/transformer_block.py", line 13, in
from megatron.core.transformer.custom_layers.transformer_engine import TENorm
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 71, in
class TELinear(te.pytorch.Linear):
AttributeError: module 'transformer_engine' has no attribute 'pytorch'