PaddleMaterials provides multiple pre-trained models and standard datasets for material property prediction, material structure generation, and interatomic potentials tasks. This document demonstrates how to perform common tasks using these existing models and standard datasets.
Training workflows are parameterized through structured configuration files, allowing end-to-end model training with simple parameter adjustments. You can refer to the PaddleMaterials Configuration section for detailed configuration information.
We have provided commands for training, evaluation, testing, and inference in each model's README file. You can also refer directly to these README files to complete corresponding tasks.
You can perform inference using either built-in models or local models.
PaddleMaterials offers multiple built-in models that can be directly used for inference. Taking the megnet_mp2018_train_60k_e_form model as an example (a MEGNet model trained on the MP2018 dataset for material formation energy prediction), use the following command for inference:
python property_prediction/predict.py --model_name='megnet_mp2018_train_60k_e_form' --weights_name='best.pdparams' --cif_file_path='./property_prediction/example_data/cifs/' --save_path='result.csv'| Parameter | Description |
|---|---|
| --model_name | Name of the built-in model |
| --weights_name | Weights file name |
| --cif_file_path | Path to CIF files for prediction |
| --save_path | Path to save prediction results |
In addition to built-in models, you can also use your own locally trained models for inference. Taking the megnet_mp2018_train_60k_e_form model as an example (assuming you've trained it locally), use the following command:
python property_prediction/predict.py --config_path='property_prediction/configs/megnet/megnet_mp2018_train_60k_e_form.yaml' --checkpoint_path='you_checkpoint_path.pdparams' --cif_file_path='./property_prediction/example_data/cifs/' --save_path='result.csv'| Parameter | Description |
|---|---|
| --config_path | Configuration file path |
| --checkpoint_path | Model weights file path |
| --cif_file_path | Path to CIF files for prediction |
| --save_path | Path to save prediction results |
To test the megnet_mp2018_train_60k_e_form model (assuming you've trained it locally) on the MP2018 test set, use:
python property_prediction/train.py -c property_prediction/configs/megnet/megnet_mp2018_train_60k_e_form.yaml Global.do_test=True Global.do_train=False Global.do_eval=False Trainer.pretrained_model_path='your_checkpoint_path(*.pdparams)' Trainer.output_dir='your_output_dir'| Parameter | Description |
|---|---|
| -c | Configuration file path |
| Global.do_train | Set to False for testing |
| Global.do_eval | Whether to evaluate on validation set |
| Global.do_test | Whether to evaluate on test set |
| Trainer.pretrained_model_path | Your model weights path |
| Trainer.output_dir | Output directory for log files |
You can train models using PaddleMaterials's standard datasets and predefined configurations. For the megnet_mp2018_train_60k_e_form model:
# Single-GPU training for formation energy per atom
python property_prediction/train.py -c property_prediction/configs/megnet/megnet_mp2018_train_60k_e_form.yamlThis command uses the -c parameter to specify the model configuration file. Training will be performed on the MP2018 training set, with logs saved to Trainer.output_dir by default (you can modify this path in the configuration file).
PaddleMaterials also supports multi-GPU training using paddle.distributed.launch:
# Multi-GPU training with 4 GPUs
python -m paddle.distributed.launch --gpus="0,1,2,3" property_prediction/train.py -c property_prediction/configs/megnet/megnet_mp2018_train_60k_e_form.yamlThe --gpus parameter specifies the GPU IDs and quantity to use.
PaddleMaterials supports training with custom datasets. If your dataset format matches the standard format, you can directly use the provided configurations by modifying the dataset paths:
...
Dataset:
train:
dataset:
__class_name__: MP2018Dataset
__init_params__:
path: "your_train_data.json"
...
val:
dataset:
__class_name__: MP2018Dataset
__init_params__:
path: "your_val_data.json"
...
test:
dataset:
__class_name__: MP2018Dataset
__init_params__:
path: "your_test_data.json"For datasets with different formats, you can either:
- Create a custom dataset class, import it in
ppmat/datasets/__init__.py, and modify the configuration - Convert your dataset to PaddleMaterials's supported format (recommended for convenience)
-
Implement your custom model class (inheriting from
nn.Layer) and import it inppmat/models/__init__.pyYour model must implement
__init__andforwardmethods. Theforwardmethod should return a dictionary containing model outputs and losses. -
Copy the configuration file of the standard dataset you want to use (e.g.,
megnet_mp2018_train_60k_e_form.yamlfor MP2018) -
Modify the
Modelsection in the configuration to use your custom model:Model: __class_name__: your_model_class_name __init_params__: your_model_parameters
-
Adjust other hyperparameters (learning rate, batch size, etc.) as needed
-
Start training with the modified configuration file
PaddleMaterials supports model finetuning. Follow these steps using standard configurations (only need to modify pretrained model path):
- Prepare your custom dataset (refer to Section 4)
- Copy the original model configuration file (e.g.,
megnet_mp2018_train_60k_e_form.yaml) - Modify dataset paths in the copied configuration to point to your custom data
- Configure pretrained model parameters:
- For local models: Set
Trainer.pretrained_model_pathto your local path - For built-in models:
- Set
Trainer.pretrained_model_pathto the built-in model URL - Set
Trainer.pretrained_weight_nameto the weights file name (e.g.,latest.pdparams)
- Set
- For local models: Set
- Adjust training parameters (learning rate, batch size, log directory, etc.)
- Execute training with the updated configuration
The message
Finish loading pretrained model from: xxx.pdparamsindicates successful model loading