Skip to content

Commit 59e8ded

Browse files
committed
Clean up image and readme
1 parent 16d9836 commit 59e8ded

2 files changed

Lines changed: 101 additions & 65 deletions

File tree

Dockerfile

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,5 @@
11
FROM --platform=linux/amd64 python:3.12-slim AS linux-base
22

3-
# Environment variables
4-
ENV UV_PROJECT_ENVIRONMENT="/venv"
5-
ENV UV_PYTHON_INSTALL_DIR="/python"
6-
ENV UV_COMPILE_BYTECODE=1
7-
ENV UV_PYTHON=python3.12
8-
ENV PATH="$UV_PROJECT_ENVIRONMENT/bin:$PATH"
9-
10-
# Clean up
11-
RUN rm -f /etc/apt/sources.list.d/*.list
12-
133
# Utilities
144
RUN apt-get update && apt-get upgrade -y
155
RUN apt-get install -y --no-install-recommends build-essential \
@@ -24,23 +14,29 @@ RUN wget -O /tmp/vscode-server-cli.tar.gz "https://update.code.visualstudio.com/
2414

2515
# Slurm
2616
RUN COMMANDS="sacct sacctmgr salloc sattach sbatch sbcast scancel scontrol sdiag sgather sinfo smap sprio squeue sreport srun sshare sstat strigger sview" \
27-
&& for CMD in $COMMANDS; do \
28-
echo '#!/bin/bash' > "/usr/local/bin/$CMD" \
29-
&& echo 'ssh $USER@$SLURM_CLUSTER_NAME "bash -l -c '\'''"$CMD"' \"$@\"'\''"' >> "/usr/local/bin/$CMD" \
30-
&& chmod +x "/usr/local/bin/$CMD"; \
31-
done
17+
&& for CMD in $COMMANDS; do echo '#!/bin/bash' > "/usr/local/bin/$CMD" \
18+
&& echo 'ssh $USER@$SLURM_CLUSTER_NAME "bash -l -c '\'''"$CMD"' \"$@\"'\''"' >> "/usr/local/bin/$CMD" \
19+
&& chmod +x "/usr/local/bin/$CMD"; done
3220

3321
FROM linux-base AS python-base
3422

23+
# Workdir
24+
WORKDIR /srv/repo
25+
26+
# Environment variables
27+
ENV UV_PROJECT_ENVIRONMENT="/venv"
28+
ENV UV_PYTHON_INSTALL_DIR="/python"
29+
ENV UV_COMPILE_BYTECODE=1
30+
ENV UV_LINK_MODE=copy
31+
ENV UV_PYTHON=python3.12
32+
ENV PATH="$UV_PROJECT_ENVIRONMENT/bin:$PATH"
33+
ENV PYTHONPATH="/srv/repo:$PYTHONPATH"
34+
3535
# Install uv
36-
COPY --from=ghcr.io/astral-sh/uv:0.5.4@sha256:49934a7a2d0a2ddfda9ddb566d6ac2449cdf31c7ebfb56fe599e04057fddca58 /uv /usr/local/bin/uv
36+
COPY --from=ghcr.io/astral-sh/uv:0.6.6 /uv /usr/local/bin/uv
3737

3838
# Environment
39-
COPY pyproject.toml ./
40-
COPY uv.lock ./
41-
RUN uv sync --frozen --no-dev --no-install-project
42-
43-
# Workdir
44-
RUN mkdir /srv/repo/ && chmod 777 /srv/repo
45-
ENV PYTHONPATH=$PYTHONPATH:/srv/repo
46-
WORKDIR /srv/repo
39+
RUN --mount=type=cache,target=/root/.cache/uv \
40+
--mount=type=bind,source=uv.lock,target=uv.lock \
41+
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
42+
uv sync --frozen --no-install-project --no-dev

README.md

Lines changed: 81 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -24,41 +24,32 @@ A modern template for machine learning experimentation using **wandb**, **hydra-
2424

2525
## 📋 Table of Contents
2626

27-
- [Container Setup](#-container-setup)
28-
- [Option 1: Apptainer](#option-1-apptainer)
29-
- [Option 2: Docker](#option-2-docker)
30-
- [Package Management](#-package-management)
31-
- [Updating the Docker Image](#-updating-the-docker-image)
32-
- [Container Registry Authentication](#-container-registry-authentication)
33-
- [Development Notes](#-development-notes)
34-
- [Running Experiments](#-running-experiments)
27+
- [🐳 Container Setup](#-container-setup)
28+
- [Option 1: Apptainer (Cluster)](#option-1-apptainer-cluster)
29+
- [Option 2: Docker (Local Machine)](#option-2-docker-local-machine)
30+
- [📦 Package Management](#-package-management)
31+
- [🔄 Updating the Docker Image](#-updating-the-docker-image)
32+
- [🔑 Container Registry Authentication](#-container-registry-authentication)
33+
- [🛠️ Development Notes](#-development-notes)
34+
- [🧪 Running Experiments](#-running-experiments)
3535
- [WandB Logging](#wandb-logging)
36-
- [Local Execution](#local-execution)
36+
- [Example Project](#example-project)
3737
- [Single Job](#single-job)
3838
- [Distributed Sweep](#distributed-sweep)
39-
- [Contributions](#-contributions)
40-
- [Acknowledgements](#-acknowledgements)
39+
- [👥 Contributions](#-contributions)
40+
- [🙏 Acknowledgements](#-acknowledgements)
4141

4242
## 🐳 Container Setup
4343

4444
Choose one of the following methods to set up your environment:
4545

46-
### Option 1: Apptainer
46+
### Option 1: Apptainer (Cluster)
4747

48-
1. **Configure environment bindings**
48+
1. **Install VSCode Remote Tunnels Extension**
4949

50-
Add to your `.zshrc` or `.bashrc`:
51-
52-
```bash
53-
export APPTAINER_BIND=/opt/slurm-23.2,/opt/slurm,/etc/slurm,/etc/munge,/var/log/munge,/var/run/munge,/lib/x86_64-linux-gnu
54-
export APPTAINERENV_APPEND_PATH=/opt/slurm/bin:/opt/slurm/sbin
55-
```
56-
57-
2. **Install VSCode Command Line Interface (Optional)**
58-
59-
This step is required if you plan to create a remote tunnel. First, install the [Remote Tunnels](https://marketplace.visualstudio.com/items?itemName=ms-vscode.remote-server) extension in VSCode.
50+
First, install the [Remote Tunnels](https://marketplace.visualstudio.com/items?itemName=ms-vscode.remote-server) extension in VSCode.
6051

61-
3. **Connect to compute resources**
52+
2. **Connect to compute resources**
6253

6354
For CPU resources:
6455
```bash
@@ -70,14 +61,16 @@ Choose one of the following methods to set up your environment:
7061
srun --partition=gpu-2h --gpus-per-task=1 --pty bash
7162
```
7263

73-
4. **Launch container**
64+
3. **Launch container**
7465

75-
To open a tunnel to connect you local VSCode to the container on the cluster:
66+
To open a tunnel to connect your local VSCode to the container on the cluster:
7667
```bash
7768
apptainer run --nv --writable-tmpfs oras://ghcr.io/marvinsxtr/ml-project-template:latest-sif code tunnel
7869
```
7970

80-
In VSCode press `Shift+Alt+P` (Windows/Linux) or `Shift+Cmd+P` (Mac), type connect to tunnel, select GitHub and select your named node on the cluster. Your IDE is now connected to the cluster.
71+
> 💡 You can specify a version tag (e.g., `v0.0.1`) instead of `latest`. Available versions are listed at [GitHub Container Registry](https://github.com/marvinsxtr/ml-project-template/pkgs/container/ml-project-template).
72+
73+
In VSCode press `Shift+Alt+P` (Windows/Linux) or `Shift+Cmd+P` (Mac), type "connect to tunnel", select GitHub and select your named node on the cluster. Your IDE is now connected to the cluster.
8174

8275
To open a shell in the container on the cluster:
8376
```bash
@@ -86,29 +79,30 @@ Choose one of the following methods to set up your environment:
8679

8780
> 💡 This may take a few minutes on the first run as the container image is downloaded.
8881
89-
### Option 2: Docker
82+
### Option 2: Docker (Local Machine)
9083

91-
Run the container directly with:
84+
1. **Install VSCode Dev Containers Extension**
9285

93-
```bash
94-
docker run -it --rm --platform=linux/amd64 ghcr.io/marvinsxtr/ml-project-template:latest /bin/bash
95-
```
86+
First, install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension in VSCode.
9687

97-
> 💡 You can specify a version tag (e.g., `v0.0.1`) instead of `latest`. Available versions are listed at [GitHub Container Registry](https://github.com/marvinsxtr/ml-project-template/pkgs/container/ml-project-template).
88+
2. **Open the Repository in the Dev Container**
89+
90+
Click the `Reopen in Container` button in the pop-up that appears once you open the repository in VSCode.
91+
92+
Alternatively, open the command palette in VSCode by pressing `Shift+Alt+P` (Windows/Linux) or `Shift+Cmd+P` (Mac), and type `Dev Containers: Reopen in Container`.
9893

9994
## 📦 Package Management
10095

10196
This project uses [uv](https://docs.astral.sh/uv/) for Python dependency management.
10297

10398
### Adding or Updating Dependencies
10499

105-
Inside the container (e.g., [VSCode shell with Docker Container](https://code.visualstudio.com/docs/devcontainers/containers)):
106-
100+
Inside the container:
107101
```bash
108102
# Add a specific package
109103
uv add <package-name>
110104

111-
# Update all dependencies from pyproject.toml or requirements.txt
105+
# Update all dependencies from pyproject.toml
112106
uv sync
113107
```
114108

@@ -173,6 +167,12 @@ Test your Dockerfile locally before pushing:
173167
docker buildx build -t ml-project-template .
174168
```
175169

170+
Run the container directly with:
171+
172+
```bash
173+
docker run -it --rm --platform=linux/amd64 ml-project-template /bin/bash
174+
```
175+
176176
## 🧪 Running Experiments
177177

178178
### WandB Logging
@@ -187,31 +187,71 @@ WANDB_ENTITY=your_entity
187187
WANDB_PROJECT=your_project_name
188188
```
189189

190-
### Local Execution
190+
### Example Project
191191

192-
Run a script locally with:
192+
The folder `src/example` contains an example project which can serve as a starting point for ML experimentation. Configuring a function
193+
```python
194+
from ml_project_template.utils import logger
195+
196+
def main(foo: int = 42, bar: int = 3) -> None:
197+
"""Run a main function from a config."""
198+
logger.info(f"Hello World! cfg={cfg}, bar={bar}, foo={foo}")
199+
200+
if __name__ == "__main__":
201+
main()
202+
```
203+
204+
is as easy as adding (1) a `Run` as the first argument, (2) importing the config stores and (3) wrapping the `main` function with `run`:
205+
206+
```python
207+
from ml_project_template.config import run
208+
from ml_project_template.runs import Run
209+
from ml_project_template.utils import logger
210+
211+
def main(cfg: Run, foo: int = 42, bar: int = 3) -> None:
212+
"""Run a main function from a config."""
213+
logger.info(f"Hello World! cfg={cfg}, bar={bar}, foo={foo}")
214+
215+
if __name__ == "__main__":
216+
from example import stores # noqa: F401
217+
run(main)
218+
```
219+
220+
You can try running this example with:
193221

194222
```bash
195223
python src/example/main.py
196224
```
197225

198226
Hydra will automatically generate a `config.yaml` in the `outputs/<date>/<time>/.hydra` folder which you can use to reproduce the same run later.
199227

200-
To enable WandB logging:
228+
Try overriding the values passed to the `main` function and see how it changes the output (config):
229+
230+
```bash
231+
python src/example/main.py foo=123
232+
```
233+
234+
Reproduce the results of a previous run/config:
235+
236+
```bash
237+
python src/example/main.py -cp outputs/<date>/<time>/.hydra -cn config.yaml
238+
```
239+
240+
Enabling WandB logging:
201241

202242
```bash
203243
python src/example/main.py cfg/wandb=base
204244
```
205245

206-
For WandB offline mode:
246+
Run WandB in offline mode:
207247

208248
```bash
209249
python src/example/main.py cfg/wandb=base cfg.wandb.mode=offline
210250
```
211251

212252
### Single Job
213253

214-
To run a job on the cluster:
254+
Run a job on the cluster:
215255

216256
```bash
217257
python src/example/main.py cfg/job=base

0 commit comments

Comments
 (0)