You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expand README training docs, default to float32 matmul precision
- Document full training workflow (data prep, config, running)
- Explain $DATASETS environment variable
- Add collapsible reference table for all training settings
- Default matmul precision to float32 instead of default
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Training uses `lorem-train`, which reads `model.yaml` and `settings.yaml` from the current directory:
42
+
Training a model involves three steps: preparing the data, configuring the model and training settings, and running the training script.
43
+
44
+
#### 1. Prepare data
45
+
46
+
Training data is stored in [marathon](https://github.com/sirmarcel/marathon) format. Convert your extended XYZ dataset using a preparation script (see `examples/train-mlp/prepare.py` for a template):
47
+
48
+
```python
49
+
from marathon.data import datasets, get_splits
50
+
from marathon.grain import prepare
51
+
52
+
# datasets is a Path resolved from the $DATASETS environment variable
The `$DATASETS` environment variable sets the root directory where prepared datasets are stored. All dataset paths in `settings.yaml` are resolved relative to this directory.
58
+
59
+
#### 2. Configure the experiment
60
+
61
+
Each experiment lives in its own directory containing two YAML files:
62
+
63
+
**`model.yaml`** defines the model architecture:
64
+
65
+
```yaml
66
+
model:
67
+
lorem.Lorem:
68
+
cutoff: 5.0
69
+
max_degree: 4
70
+
max_degree_lr: 2
71
+
num_features: 128
72
+
num_spherical_features: 4
73
+
num_message_passing: 1
74
+
```
75
+
76
+
Use `lorem.LoremBEC` instead of `lorem.Lorem` to train a model that additionally predicts Born effective charges.
77
+
78
+
**`settings.yaml`** configures training:
79
+
80
+
```yaml
81
+
train: "my_project/train" # path relative to $DATASETS
82
+
valid: "my_project/valid" # path relative to $DATASETS
83
+
seed: 23
84
+
batcher:
85
+
batch_size: 4
86
+
loss_weights: {"energy": 0.5, "forces": 0.5}
87
+
optimizer: adam # adam or muon
88
+
start_learning_rate: 1e-3
89
+
min_learning_rate: 1e-6
90
+
max_epochs: 2000
91
+
valid_every_epoch: 2
92
+
decay_style: linear # linear, exponential, or warmup_cosine
93
+
use_wandb: True
94
+
```
95
+
96
+
<details>
97
+
<summary>All training settings</summary>
98
+
99
+
| Setting | Default | Description |
100
+
|---|---|---|
101
+
| `train` | *required* | Training dataset path (relative to `$DATASETS`) |
See `examples/train-mlp/` for a complete example including data preparation and configuration files.
144
+
Training writes checkpoints, logs, and plots to a `run/` directory inside the experiment folder. If a `run/` directory already exists, training resumes from the latest checkpoint.
145
+
146
+
See `examples/train-mlp/` and `examples/train-bec/` for complete examples including data preparation and configuration files.
0 commit comments