|
1 | | -# 🚀 ML Project Template |
| 1 | +# nanoTabPFN |
2 | 2 |
|
3 | | -A modern template for machine learning experimentation using **wandb**, **hydra-zen**, and **submitit** on a Slurm cluster with Docker/Apptainer containerization. |
| 3 | +Train your own small TabPFN in less than 500 LOC and a few minutes. |
4 | 4 |
|
5 | | -> **Note**: This template is optimized for the ML Group cluster setup but can be easily adapted to similar environments. |
| 5 | +The purpose of this repository is to be a good starting point for students and researchers that are interested in learning about how TabPFN works under the hood. |
6 | 6 |
|
7 | | -<div align="center"> |
8 | | - |
9 | | -[](https://www.python.org/downloads/release/python-3120/) |
10 | | -[](https://www.docker.com/) |
11 | | -[](https://wandb.ai) |
12 | | -[](https://github.com/mit-ll-responsible-ai/hydra-zen) |
13 | | -[](https://github.com/facebookincubator/submitit) |
14 | | - |
15 | | -</div> |
16 | | - |
17 | | -## ✨ Key Features |
18 | | - |
19 | | -- 📦 Python environment in Docker via [uv](https://docs.astral.sh/uv/) |
20 | | -- 📊 Logging and visualizations via [Weights and Biases](https://wandb.com) |
21 | | -- 🧩 Reproducibility and modular type-checked configs via [hydra-zen](https://github.com/mit-ll-responsible-ai/hydra-zen) |
22 | | -- 🖥️ Submit Slurm jobs and parameter sweeps directly from Python via [submitit](https://github.com/facebookincubator/submitit) |
23 | | -- 🔄 No `.def` or `.sh` files needed for Apptainer/Slurm |
24 | | - |
25 | | -## 📋 Table of Contents |
26 | | - |
27 | | -- [🔑 Container Registry Authentication](#-container-registry-authentication) |
28 | | -- [🐳 Container Setup](#-container-setup) |
29 | | - - [Option 1: Apptainer (Cluster)](#option-1-apptainer-cluster) |
30 | | - - [Option 2: Docker (Local Machine)](#option-2-docker-local-machine) |
31 | | -- [📦 Package Management](#-package-management) |
32 | | -- [🛠️ Development Notes](#️-development-notes) |
33 | | -- [🧪 Running Experiments](#-running-experiments) |
34 | | - - [WandB Logging](#wandb-logging) |
35 | | - - [Example Project](#example-project) |
36 | | - - [Single Job](#single-job) |
37 | | - - [Distributed Sweep](#distributed-sweep) |
38 | | -- [👥 Contributions](#-contributions) |
39 | | -- [🙏 Acknowledgements](#-acknowledgements) |
40 | | - |
41 | | -## 🔑 Container Registry Authentication |
42 | | - |
43 | | -### Generate Token |
44 | | - |
45 | | -1. Create a new GitHub token at [Settings → Developer settings → Personal access tokens](https://github.com/settings/tokens) with: |
46 | | - - `read:packages` permission |
47 | | - - `write:packages` permission |
48 | | - |
49 | | -### Log In |
50 | | - |
51 | | -With Apptainer: |
52 | | -```bash |
53 | | -apptainer remote login --username <your GitHub username> docker://ghcr.io |
| 7 | +Clone the repository, afterwards install dependencies via: |
54 | 8 | ``` |
55 | | - |
56 | | -With Docker: |
57 | | -```bash |
58 | | -docker login ghcr.io -u <your GitHub username> |
| 9 | +pip install numpy torch schedulefree h5py scikit-learn openml seaborn |
59 | 10 | ``` |
60 | 11 |
|
61 | | -When prompted, enter your token as the password. |
62 | | - |
63 | | -## 🐳 Container Setup |
64 | | - |
65 | | -Choose one of the following methods to set up your environment: |
66 | | - |
67 | | -### Option 1: Apptainer (Cluster) |
68 | | - |
69 | | -1. **Install VSCode Remote Tunnels Extension** |
70 | | - |
71 | | - First, install the [Remote Tunnels](https://marketplace.visualstudio.com/items?itemName=ms-vscode.remote-server) extension in VSCode. |
72 | | - |
73 | | -2. **Connect to compute resources** |
74 | | - |
75 | | - For CPU resources: |
76 | | - ```bash |
77 | | - srun --partition=cpu-2h --pty bash |
78 | | - ``` |
79 | | - |
80 | | - For GPU resources: |
81 | | - ```bash |
82 | | - srun --partition=gpu-2h --gpus-per-task=1 --pty bash |
83 | | - ``` |
84 | | - |
85 | | -3. **Launch container** |
86 | | - |
87 | | - To open a tunnel to connect your local VSCode to the container on the cluster: |
88 | | - ```bash |
89 | | - apptainer run --nv --writable-tmpfs oras://ghcr.io/marvinsxtr/ml-project-template:latest-sif code tunnel |
90 | | - ``` |
91 | | - |
92 | | - > 💡 You can specify a version tag (e.g., `v0.0.1`) instead of `latest`. Available versions are listed at [GitHub Container Registry](https://github.com/marvinsxtr/ml-project-template/pkgs/container/ml-project-template). |
93 | | -
|
94 | | - In VSCode press `Shift+Alt+P` (Windows/Linux) or `Shift+Cmd+P` (Mac), type "connect to tunnel", select GitHub and select your named node on the cluster. Your IDE is now connected to the cluster. |
95 | | - |
96 | | - To open a shell in the container on the cluster: |
97 | | - ```bash |
98 | | - apptainer run --nv --writable-tmpfs oras://ghcr.io/marvinsxtr/ml-project-template:latest-sif /bin/bash |
99 | | - ``` |
100 | | - |
101 | | - > 💡 This may take a few minutes on the first run as the container image is downloaded. |
102 | | -
|
103 | | -### Option 2: Docker (Local Machine) |
104 | | - |
105 | | -1. **Install VSCode Dev Containers Extension** |
106 | | - |
107 | | - First, install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension in VSCode. |
108 | | - |
109 | | -2. **Open the Repository in the Dev Container** |
110 | | - |
111 | | - Click the `Reopen in Container` button in the pop-up that appears once you open the repository in VSCode. |
112 | | - |
113 | | - Alternatively, open the command palette in VSCode by pressing `Shift+Alt+P` (Windows/Linux) or `Shift+Cmd+P` (Mac), and type `Dev Containers: Reopen in Container`. |
114 | | - |
115 | | -### Using Slurm within Apptainer |
116 | | - |
117 | | -In order to access Slurm with submitit from within the container, you first need to set up passwordless SSH to the login node. |
118 | | - |
119 | | -On the cluster, create a new SSH key pair in case you don't have one yet |
120 | | - |
121 | | -```bash |
122 | | -ssh-keygen -t ed25519 -C "your_email@example.com" |
123 | | -``` |
124 | | - |
125 | | -and add your public key to the `authorized_keys`: |
126 | | - |
127 | | -```bash |
128 | | -cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys |
129 | | -``` |
130 | | - |
131 | | -You can verify that this works by running |
132 | | - |
133 | | -```bash |
134 | | -ssh $USER@$HOST exit |
135 | | -``` |
136 | | - |
137 | | -which should return without any prompt. |
138 | | - |
139 | | -## 📦 Package Management |
140 | | - |
141 | | -1. **Update dependencies** |
142 | | - |
143 | | - This project uses [uv](https://docs.astral.sh/uv/) for Python dependency management. |
144 | | - |
145 | | - Inside the container (!): |
146 | | - ```bash |
147 | | - # Add a specific package |
148 | | - uv add <package-name> |
149 | | - |
150 | | - # Update all dependencies from pyproject.toml |
151 | | - uv sync |
152 | | - ``` |
153 | | - |
154 | | -2. **Commit changes** to the repository: |
155 | | - |
156 | | - Use tags for versioning: |
157 | | - |
158 | | - ```bash |
159 | | - git add pyproject.toml uv.lock |
160 | | - git commit -m "Updated dependencies" |
161 | | - git tag v0.0.1 |
162 | | - git push && git push --tags |
163 | | - ``` |
164 | | - |
165 | | -3. **Use the updated image**: |
166 | | - |
167 | | - The GitHub Actions workflow automatically builds a new image when changes are pushed. |
168 | | - |
169 | | - With Apptainer: |
170 | | - ```bash |
171 | | - apptainer run --nv --writable-tmpfs oras://ghcr.io/marvinsxtr/ml-project-template:v0.0.1-sif /bin/bash |
172 | | - ``` |
| 12 | +### Our Code |
173 | 13 |
|
174 | | - With Docker: |
175 | | - ```bash |
176 | | - docker run -it --rm --platform=linux/amd64 ghcr.io/marvinsxtr/ml-project-template:v0.0.1 /bin/bash |
177 | | - ``` |
| 14 | +- `model.py` contains the implementation of the architecture and a sklearn-like interface in less than 200 lines of code. |
| 15 | +- `train.py` implements a simple training loop and prior dump data loader in under 200 lines |
| 16 | +- `experiment.ipynb` will recreate the experiment from the paper |
178 | 17 |
|
179 | | -## 🛠️ Development Notes |
180 | 18 |
|
181 | | -### Building Locally for Testing |
| 19 | +### Pretrain your own nanoTabPFN |
182 | 20 |
|
183 | | -Test your Dockerfile locally before pushing: |
| 21 | +To pretrain your own nanoTabPFN, you need to first download a prior data dump from [here](http://ml.informatik.uni-freiburg.de/research-artifacts/nanoTabPFN/300k_150x5_2.h5), then run `train.py`. |
184 | 22 |
|
185 | 23 | ```bash |
186 | | -docker buildx build -t ml-project-template . |
| 24 | +cd nanoTabPFN |
| 25 | + |
| 26 | +# download data dump |
| 27 | +curl http://ml.informatik.uni-freiburg.de/research-artifacts/nanoTabPFN/300k_150x5_2.h5 --output 300k_150x5_2.h5 |
| 28 | + |
| 29 | +python train.py |
| 30 | +``` |
| 31 | + |
| 32 | +#### Step by Step explanation: |
| 33 | + |
| 34 | +First we import our code from model.py and train.py |
| 35 | +```py |
| 36 | +from model import NanoTabPFNModel |
| 37 | +from model import NanoTabPFNClassifier |
| 38 | +from train import PriorDumpDataLoader |
| 39 | +from train import train, get_default_device |
| 40 | +``` |
| 41 | +Then we instantiate our model |
| 42 | +```py |
| 43 | +model = NanoTabPFNModel( |
| 44 | + embedding_size=96, |
| 45 | + num_attention_heads=4, |
| 46 | + mlp_hidden_size=192, |
| 47 | + num_layers=3, |
| 48 | + num_outputs=2 |
| 49 | +) |
| 50 | +``` |
| 51 | +and our dataloader |
| 52 | +```py |
| 53 | +prior = PriorDumpDataLoader( |
| 54 | + "300k_150x5_2.h5", |
| 55 | + num_steps=2500, |
| 56 | + batch_size=32, |
| 57 | +) |
| 58 | +``` |
| 59 | +Now we can train our model: |
| 60 | +```py |
| 61 | +device = get_default_device() |
| 62 | +model, _ = train( |
| 63 | + model, |
| 64 | + prior, |
| 65 | + lr = 4e-3, |
| 66 | + device = device |
| 67 | +) |
| 68 | +``` |
| 69 | +and finally we can instantiate our classifier: |
| 70 | +```py |
| 71 | +clf = NanoTabPFNClassifier(model, device) |
| 72 | +``` |
| 73 | +and use its `.fit`, `.predict` and `.predict_proba`: |
| 74 | +```py |
| 75 | +from sklearn.datasets import load_breast_cancer |
| 76 | +from sklearn.metrics import roc_auc_score, accuracy_score |
| 77 | +from sklearn.model_selection import train_test_split |
| 78 | + |
| 79 | +X, y = load_breast_cancer(return_X_y=True) |
| 80 | +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) |
| 81 | + |
| 82 | +clf.fit(X_train, y_train) |
| 83 | +prob = clf.predict_proba(X_test) |
| 84 | +pred = clf.predict(X_test) |
| 85 | +print('ROC AUC', roc_auc_score(y_test, prob)) |
| 86 | +print('Accuracy', accuracy_score(y_test, pred)) |
| 87 | +``` |
| 88 | + |
| 89 | +### TFM-Playground |
| 90 | + |
| 91 | +The nanoTabPFN repository is supposed to stay ultra small and simple, but we created another repository, |
| 92 | +the [TFM-Playground](https://github.com/automl/TFM-Playground/) which we are building out to have a lot more features, |
| 93 | +like regression, multiple prior interfaces, multiple architectures, ensembling of different pre-processings and more, |
| 94 | +so check it out if you are interested! |
| 95 | + |
| 96 | +### BibTex Citation |
| 97 | + |
| 98 | +``` |
| 99 | +@article{pfefferle2025nanotabpfn, |
| 100 | + title={nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN}, |
| 101 | + author={Pfefferle, Alexander and Hog, Johannes and Purucker, Lennart and Hutter, Frank}, |
| 102 | + journal={arXiv preprint arXiv:2511.03634}, |
| 103 | + year={2025} |
| 104 | +} |
187 | 105 | ``` |
188 | | - |
189 | | -Run the container directly with: |
190 | | - |
191 | | -```bash |
192 | | -docker run -it --rm --platform=linux/amd64 ml-project-template /bin/bash |
193 | | -``` |
194 | | - |
195 | | -## 🧪 Running Experiments |
196 | | - |
197 | | -### WandB Logging |
198 | | - |
199 | | -Logging to WandB is optional for local jobs but mandatory for jobs submitted to the cluster. |
200 | | - |
201 | | -Create a `.env` file in the root of the repository with: |
202 | | - |
203 | | -```bash |
204 | | -WANDB_API_KEY=your_api_key |
205 | | -WANDB_ENTITY=your_entity |
206 | | -WANDB_PROJECT=your_project_name |
207 | | -``` |
208 | | - |
209 | | -### Example Project |
210 | | - |
211 | | -The folder `example` contains an example project which can serve as a starting point for ML experimentation. Configuring a function |
212 | | -```python |
213 | | -from ml_project_template.utils import logger |
214 | | - |
215 | | -def main(foo: int = 42, bar: int = 3) -> None: |
216 | | - """Run a main function from a config.""" |
217 | | - logger.info(f"Hello World! cfg={cfg}, bar={bar}, foo={foo}") |
218 | | - |
219 | | -if __name__ == "__main__": |
220 | | - main() |
221 | | -``` |
222 | | - |
223 | | -is as easy as adding (1) a `Run` as the first argument, (2) importing the config stores and (3) wrapping the `main` function with `run`: |
224 | | - |
225 | | -```python |
226 | | -from ml_project_template.config import run |
227 | | -from ml_project_template.runs import Run |
228 | | -from ml_project_template.utils import logger |
229 | | - |
230 | | -def main(cfg: Run, foo: int = 42, bar: int = 3) -> None: |
231 | | - """Run a main function from a config.""" |
232 | | - logger.info(f"Hello World! cfg={cfg}, bar={bar}, foo={foo}") |
233 | | - |
234 | | -if __name__ == "__main__": |
235 | | - from example import stores # noqa: F401 |
236 | | - run(main) |
237 | | -``` |
238 | | - |
239 | | -You can try running this example with: |
240 | | - |
241 | | -```bash |
242 | | -python example/main.py |
243 | | -``` |
244 | | - |
245 | | -Hydra will automatically generate a `config.yaml` in the `outputs/<date>/<time>/.hydra` folder which you can use to reproduce the same run later. |
246 | | - |
247 | | -Try overriding the values passed to the `main` function and see how it changes the output (config): |
248 | | - |
249 | | -```bash |
250 | | -python example/main.py foo=123 |
251 | | -``` |
252 | | - |
253 | | -Reproduce the results of a previous run/config: |
254 | | - |
255 | | -```bash |
256 | | -python example/main.py -cp outputs/<date>/<time>/.hydra -cn config.yaml |
257 | | -``` |
258 | | - |
259 | | -Enabling WandB logging: |
260 | | - |
261 | | -```bash |
262 | | -python example/main.py cfg/wandb=base |
263 | | -``` |
264 | | - |
265 | | -Run WandB in offline mode: |
266 | | - |
267 | | -```bash |
268 | | -python example/main.py cfg/wandb=base cfg.wandb.mode=offline |
269 | | -``` |
270 | | - |
271 | | -### Single Job |
272 | | - |
273 | | -Run a job on the cluster: |
274 | | - |
275 | | -```bash |
276 | | -python example/main.py cfg/job=base |
277 | | -``` |
278 | | - |
279 | | -This will automatically enable WandB logging. See `example/configs.py` to configure the job settings. |
280 | | - |
281 | | -### Distributed Sweep |
282 | | - |
283 | | -Run a parameter sweep over multiple seeds using multiple nodes: |
284 | | - |
285 | | -```bash |
286 | | -python example/main.py cfg/job=sweep |
287 | | -``` |
288 | | - |
289 | | -This will automatically enable WandB logging. See `example/configs.py` to configure sweep parameters. |
290 | | - |
291 | | -## 👥 Contributions |
292 | | - |
293 | | -Contributions to this documentation and template are very welcome! Feel free to open a PR or reach out with suggestions. |
294 | | - |
295 | | -## 🙏 Acknowledgements |
296 | | - |
297 | | -This template is based on a [previous example project](https://github.com/mx-e/example_project_ml_cluster). |
0 commit comments