Skip to content

Latest commit

 

History

History
454 lines (422 loc) · 13.3 KB

File metadata and controls

454 lines (422 loc) · 13.3 KB

About Configs 🧩

PaddleMaterials implements full lifecycle management for model training, covering core stages like training, fine-tuning, and prediction. It includes standardized datasets and build-in pre-trained model libraries, supporting one-click prediction. Training workflows are parameterized through structured configuration files, allowing end-to-end model training with simple parameter adjustments.

Field Name Description
Global System-level parameters for centralized management of public configurations and cross-module shared settings.
Trainer Defines core training parameters including epoch count, checkpoint saving policies, and distributed training configurations.
Model Neural network architecture definition module with initialization parameters and loss function configurations.
Dataset Standardized data loading with integrated preprocessing, batching, and multi-process reading mechanisms.
Metric Evaluation metric functions for performance assessment during training and testing.
Optimizer Optimizer configuration interface supporting learning rate scheduling, weight decay, and gradient clipping parameters.
Predict Configuration parameters for prediction workflows.

Next, we demonstrate the configuration structure using MegNet training on the mp2018.6.1 dataset. The complete configuration file is available at megnet_mp2018_train_60k_e_form.yaml. This configuration enables training of the MegNet model on mp2018.6.1 for formation energy, with the trained model capable of predicting formation energy for input structures.

1. Global Configuration

Global:
# For mp2018 dataset, property names include:
# "formation_energy_per_atom", "band_gap", "G", "K"
label_names: ["formation_energy_per_atom"]
do_train: True
do_eval: False
do_test: False

graph_converter:
    __class_name__: FindPointsInSpheres
    __init_params__:
        cutoff: 4.0
        num_cpus: 10
Field Name Type Description
label_names List[str] Defines model training targets (must match dataset column names exactly). This example enables only formation energy prediction.
do_train Bool Enables/disables training loop execution.
do_eval Bool Enables/disables standalone evaluation process (independent of periodic validation during training).
do_test Bool Enables/disables inference testing (disabled by default).
graph_converter Class Config Material structure to graph conversion configuration for data loading and prediction stages.

PaddleMaterials uses __class_name__ and __init_params__ for flexible class instantiation without hardcoding, enabling different graph construction methods through configuration changes.

2. Trainer Configuration

The Trainer section initializes a BaseTrainer object controlling training, evaluation, and testing workflows:

Trainer:
  max_epochs: 2000
  seed: 42
  output_dir: ./output/megnet_mp2018_train_60k_e_form
  save_freq: 100
  log_freq: 20
  start_eval_epoch: 1
  eval_freq: 1
  pretrained_model_path: null
  pretrained_weight_name: null
  resume_from_checkpoint: null
  use_amp: False
  amp_level: 'O1'
  eval_with_no_grad: True
  gradient_accumulation_steps: 1
  best_metric_indicator: 'eval_metric'
  name_for_best_metric: "formation_energy_per_atom"
  greater_is_better: False
  compute_metric_during_train: True
  metric_strategy_during_eval: 'epoch'
  use_visualdl: False
  use_wandb: False
  use_tensorboard: False
Field Name Type Description
max_epochs int Maximum training epochs.
seed int Random seed for reproducibility (controls numpy/paddle/random libraries).
output_dir str Output directory for model weights and logs.
save_freq int Checkpoint saving interval (epochs). Set to 0 for final epoch-only saving.
log_freq int Training log interval (steps).
start_eval_epoch int Epoch to begin evaluation (avoids early-stage fluctuations).
eval_freq int Evaluation interval (epochs). Set to 0 to disable periodic validation.
pretrained_model_path str/None Pre-trained model path (None = no pre-training).
pretrained_weight_name str/None When using the built-in model, specify the exact weight file name (e.g., latest.pdparams).
resume_from_checkpoint str/None Checkpoint path for training resumption (requires optimizer state and training metadata).
use_amp bool Enables automatic mixed precision training.
amp_level str Mixed precision mode ('O1'=partial FP32, 'O2'=FP16 optimization).
eval_with_no_grad bool Disables gradient computation during evaluation (set to False for models with higher-order derivatives).
gradient_accumulation_steps int Gradient accumulation steps for large batch simulation.
best_metric_indicator str Metric for best model selection (train/eval loss/metric).
name_for_best_metric str Specific metric name (must match Metric configuration).
greater_is_better bool Metric optimization direction (False = lower is better).
compute_metric_during_train bool Enables training set metric computation.
metric_strategy_during_eval str Evaluation strategy (an "epoch" refers to calculations performed after completing a full pass through the entire dataset, whereas a "step" denotes incremental calculations processed with each individual batch.).
use_visualdl/wandb/tensorboard bool Enables specific training logging tools.

3. Model Configuration

Defines model architecture and hyperparameters. Example for MEGNetPlus:

Model:
  __class_name__: MEGNetPlus
  __init_params__:
    dim_node_embedding: 16
    dim_edge_embedding: 100
    dim_state_embedding: 2
    nblocks: 3
    nlayers_set2set: 1
    niters_set2set: 2
    bond_expansion_cfg:
      rbf_type: "Gaussian"
      initial: 0.0
      final: 5.0
      num_centers: 100
      width: 0.5
    property_name: ${Global.label_names}
    data_mean: -1.6519
    data_std: 1.0694
Field Name Type Description
__class_name__ str Model class name.
__init_params__ dict Initialization parameters (e.g., node embedding dimension).

4. Metric Configuration

Defines evaluation metrics. Example:

Metric:
  formation_energy_per_atom:
    __class_name__: paddle.nn.L1Loss
    __init_params__: {}

Specifies metrics for specific properties (e.g., MAE for formation energy).

Field Name Type Description
__class_name__ str Metric class name (supports PaddlePaddle APIs).
__init_params__ dict Initialization parameters (empty dict if none).

5. Optimizer Configuration

Defines optimizer and learning rate parameters. Example:

Optimizer:
  __class_name__: Adam
  __init_params__:
    beta1: 0.9
    beta2: 0.999
    lr:
      __class_name__: Cosine
      __init_params__:
        learning_rate: 0.001
        eta_min: 0.0001
        by_epoch: True
Field Name Type Description
__class_name__ str Optimizer class name (e.g., Adam).
__init_params__ dict Optimizer parameters (e.g., beta1/beta2 for Adam).
lr.__class_name__ str Learning rate scheduler class name (e.g., Cosine).
lr.__init_params__ dict Scheduler parameters (e.g., initial/min learning rates).

6. Dataset Configuration

Defines dataset classes and parameters. Example:

Dataset:
  train:
    dataset:
      __class_name__: MP2018Dataset
      __init_params__:
        path: "./data/mp2018_train_60k/mp.2018.6.1_train.json"
        property_names: ${Global.label_names}
        build_structure_cfg:
          format: cif_str
          num_cpus: 10
        build_graph_cfg: ${Global.graph_converter}
        cache_path: "./data/mp2018_train_60k_cache_find_points_in_spheres_cutoff_4/mp.2018.6.1_train"
      num_workers: 4
      use_shared_memory: False
    sampler:
      __class_name__: BatchSampler
      __init_params__:
        shuffle: True
        drop_last: True
        batch_size: 128
  val:
    # Similar structure to train with validation-specific parameters
  test:
    # Similar structure to train with test-specific parameters
Field Name Type Description
train.dataset.__class_name__ str Dataset class name (e.g., MP2018Dataset).
train.dataset.__init_params__.path str Data file path.
train.dataset.__init_params__.property_names str Target properties (references Global labels).
train.dataset.__init_params__.build_structure_cfg dict Material structure construction parameters.
train.sampler.__init_params__.batch_size int Training batch size (per GPU).

7. Predict Configuration

Defines prediction parameters. Example:

Predict:
  graph_converter: ${Global.graph_converter}
  eval_with_no_grad: True

References global graph converter and disables gradient computation during prediction (set to False for models with higher-order derivatives).