预训练的差异

GPU(brightmart版, tiny模型):
export BERT_BASE_DIR=./albert_tiny_zh
nohup python3 run_pretraining.py --input_file=./data/tf*.tfrecord  \
--output_dir=./my_new_model_path --do_train=True --do_eval=True --bert_config_file=$BERT_BASE_DIR/albert_config_tiny.json \
--train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=51 \
--num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176    \
--save_checkpoints_steps=2000  --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &

GPU(Google版本, small模型):
export BERT_BASE_DIR=./albert_small_zh_google
nohup python3 run_pretraining_google.py --input_file=./data/tf*.tfrecord --eval_batch_size=64 \
--output_dir=./my_new_model_path --do_train=True --do_eval=True --albert_config_file=$BERT_BASE_DIR/albert_config_small_google.json  --export_dir=./my_new_model_path_export \
--train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=20 \
--num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176   \
--save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt

TPU, add something like this:
    --use_tpu=True  --tpu_name=grpc://10.240.1.66:8470 --tpu_zone=us-central1-a
    

@brightmart 您好，但我看modeling_google 与 modeling 似乎前者仍是bert embedding方式，后者才是加上了因式分解，那为什么small进行预训练要用与bert一样的方式呢，若理解不对还请指正


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

预训练的差异 #171

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

预训练的差异 #171

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions