PaddleMaterials implements full lifecycle management for model training, covering core stages like training, fine-tuning, and prediction. It includes standardized datasets and build-in pre-trained model libraries, supporting one-click prediction. Training workflows are parameterized through structured configuration files, allowing end-to-end model training with simple parameter adjustments.
| Field Name | Description |
|---|---|
| Global | System-level parameters for centralized management of public configurations and cross-module shared settings. |
| Trainer | Defines core training parameters including epoch count, checkpoint saving policies, and distributed training configurations. |
| Model | Neural network architecture definition module with initialization parameters and loss function configurations. |
| Dataset | Standardized data loading with integrated preprocessing, batching, and multi-process reading mechanisms. |
| Metric | Evaluation metric functions for performance assessment during training and testing. |
| Optimizer | Optimizer configuration interface supporting learning rate scheduling, weight decay, and gradient clipping parameters. |
| Predict | Configuration parameters for prediction workflows. |
Next, we demonstrate the configuration structure using MegNet training on the mp2018.6.1 dataset. The complete configuration file is available at megnet_mp2018_train_60k_e_form.yaml. This configuration enables training of the MegNet model on mp2018.6.1 for formation energy, with the trained model capable of predicting formation energy for input structures.
Global:
# For mp2018 dataset, property names include:
# "formation_energy_per_atom", "band_gap", "G", "K"
label_names: ["formation_energy_per_atom"]
do_train: True
do_eval: False
do_test: False
graph_converter:
__class_name__: FindPointsInSpheres
__init_params__:
cutoff: 4.0
num_cpus: 10| Field Name | Type | Description |
|---|---|---|
| label_names | List[str] | Defines model training targets (must match dataset column names exactly). This example enables only formation energy prediction. |
| do_train | Bool | Enables/disables training loop execution. |
| do_eval | Bool | Enables/disables standalone evaluation process (independent of periodic validation during training). |
| do_test | Bool | Enables/disables inference testing (disabled by default). |
| graph_converter | Class Config | Material structure to graph conversion configuration for data loading and prediction stages. |
PaddleMaterials uses __class_name__ and __init_params__ for flexible class instantiation without hardcoding, enabling different graph construction methods through configuration changes.
The Trainer section initializes a BaseTrainer object controlling training, evaluation, and testing workflows:
Trainer:
max_epochs: 2000
seed: 42
output_dir: ./output/megnet_mp2018_train_60k_e_form
save_freq: 100
log_freq: 20
start_eval_epoch: 1
eval_freq: 1
pretrained_model_path: null
pretrained_weight_name: null
resume_from_checkpoint: null
use_amp: False
amp_level: 'O1'
eval_with_no_grad: True
gradient_accumulation_steps: 1
best_metric_indicator: 'eval_metric'
name_for_best_metric: "formation_energy_per_atom"
greater_is_better: False
compute_metric_during_train: True
metric_strategy_during_eval: 'epoch'
use_visualdl: False
use_wandb: False
use_tensorboard: False| Field Name | Type | Description |
|---|---|---|
| max_epochs | int | Maximum training epochs. |
| seed | int | Random seed for reproducibility (controls numpy/paddle/random libraries). |
| output_dir | str | Output directory for model weights and logs. |
| save_freq | int | Checkpoint saving interval (epochs). Set to 0 for final epoch-only saving. |
| log_freq | int | Training log interval (steps). |
| start_eval_epoch | int | Epoch to begin evaluation (avoids early-stage fluctuations). |
| eval_freq | int | Evaluation interval (epochs). Set to 0 to disable periodic validation. |
| pretrained_model_path | str/None | Pre-trained model path (None = no pre-training). |
| pretrained_weight_name | str/None | When using the built-in model, specify the exact weight file name (e.g., latest.pdparams). |
| resume_from_checkpoint | str/None | Checkpoint path for training resumption (requires optimizer state and training metadata). |
| use_amp | bool | Enables automatic mixed precision training. |
| amp_level | str | Mixed precision mode ('O1'=partial FP32, 'O2'=FP16 optimization). |
| eval_with_no_grad | bool | Disables gradient computation during evaluation (set to False for models with higher-order derivatives). |
| gradient_accumulation_steps | int | Gradient accumulation steps for large batch simulation. |
| best_metric_indicator | str | Metric for best model selection (train/eval loss/metric). |
| name_for_best_metric | str | Specific metric name (must match Metric configuration). |
| greater_is_better | bool | Metric optimization direction (False = lower is better). |
| compute_metric_during_train | bool | Enables training set metric computation. |
| metric_strategy_during_eval | str | Evaluation strategy (an "epoch" refers to calculations performed after completing a full pass through the entire dataset, whereas a "step" denotes incremental calculations processed with each individual batch.). |
| use_visualdl/wandb/tensorboard | bool | Enables specific training logging tools. |
Defines model architecture and hyperparameters. Example for MEGNetPlus:
Model:
__class_name__: MEGNetPlus
__init_params__:
dim_node_embedding: 16
dim_edge_embedding: 100
dim_state_embedding: 2
nblocks: 3
nlayers_set2set: 1
niters_set2set: 2
bond_expansion_cfg:
rbf_type: "Gaussian"
initial: 0.0
final: 5.0
num_centers: 100
width: 0.5
property_name: ${Global.label_names}
data_mean: -1.6519
data_std: 1.0694| Field Name | Type | Description |
|---|---|---|
| __class_name__ | str | Model class name. |
| __init_params__ | dict | Initialization parameters (e.g., node embedding dimension). |
Defines evaluation metrics. Example:
Metric:
formation_energy_per_atom:
__class_name__: paddle.nn.L1Loss
__init_params__: {}Specifies metrics for specific properties (e.g., MAE for formation energy).
| Field Name | Type | Description |
|---|---|---|
| __class_name__ | str | Metric class name (supports PaddlePaddle APIs). |
| __init_params__ | dict | Initialization parameters (empty dict if none). |
Defines optimizer and learning rate parameters. Example:
Optimizer:
__class_name__: Adam
__init_params__:
beta1: 0.9
beta2: 0.999
lr:
__class_name__: Cosine
__init_params__:
learning_rate: 0.001
eta_min: 0.0001
by_epoch: True| Field Name | Type | Description |
|---|---|---|
| __class_name__ | str | Optimizer class name (e.g., Adam). |
| __init_params__ | dict | Optimizer parameters (e.g., beta1/beta2 for Adam). |
| lr.__class_name__ | str | Learning rate scheduler class name (e.g., Cosine). |
| lr.__init_params__ | dict | Scheduler parameters (e.g., initial/min learning rates). |
Defines dataset classes and parameters. Example:
Dataset:
train:
dataset:
__class_name__: MP2018Dataset
__init_params__:
path: "./data/mp2018_train_60k/mp.2018.6.1_train.json"
property_names: ${Global.label_names}
build_structure_cfg:
format: cif_str
num_cpus: 10
build_graph_cfg: ${Global.graph_converter}
cache_path: "./data/mp2018_train_60k_cache_find_points_in_spheres_cutoff_4/mp.2018.6.1_train"
num_workers: 4
use_shared_memory: False
sampler:
__class_name__: BatchSampler
__init_params__:
shuffle: True
drop_last: True
batch_size: 128
val:
# Similar structure to train with validation-specific parameters
test:
# Similar structure to train with test-specific parameters| Field Name | Type | Description |
|---|---|---|
| train.dataset.__class_name__ | str | Dataset class name (e.g., MP2018Dataset). |
| train.dataset.__init_params__.path | str | Data file path. |
| train.dataset.__init_params__.property_names | str | Target properties (references Global labels). |
| train.dataset.__init_params__.build_structure_cfg | dict | Material structure construction parameters. |
| train.sampler.__init_params__.batch_size | int | Training batch size (per GPU). |
Defines prediction parameters. Example:
Predict:
graph_converter: ${Global.graph_converter}
eval_with_no_grad: TrueReferences global graph converter and disables gradient computation during prediction (set to False for models with higher-order derivatives).