@@ -37,7 +37,7 @@ We fine-tune [`PrimeIntellect/Qwen3-0.6B`](https://huggingface.co/PrimeIntellect
3737To train on a single GPU with ` medarc_train ` :
3838
3939``` bash
40- medarc_train sft examples/reverse_text/sft.toml \
40+ medarc_train sft --config examples/reverse_text/sft.toml \
4141 --output-dir outputs/examples/reverse-sft \
4242 --wandb.project reverse-text --wandb.name reverse-text-sft
4343```
@@ -52,7 +52,7 @@ sft @ examples/reverse_text/sft.toml \
5252To train on multiple GPUs with ` medarc_train ` :
5353
5454``` bash
55- medarc_train sft examples/reverse_text/sft.toml \
55+ medarc_train sft --config examples/reverse_text/sft.toml \
5656 --output-dir outputs/examples/reverse-sft \
5757 --gpus 2 \
5858 --wandb.project reverse-text --wandb.name reverse-text-sft
@@ -79,7 +79,7 @@ The RL config uses the published [`PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT`](
7979Submit a 1-GPU SFT job via ` medarc_slurm ` :
8080
8181``` bash
82- medarc_slurm sft examples/reverse_text/sft.toml \
82+ medarc_slurm sft --config examples/reverse_text/sft.toml \
8383 --output-dir outputs/examples/reverse-sft \
8484 --gpus 1 \
8585 --auto-auth \
@@ -89,7 +89,7 @@ medarc_slurm sft examples/reverse_text/sft.toml \
8989Or preview without submitting:
9090
9191``` bash
92- medarc_slurm sft examples/reverse_text/sft.toml \
92+ medarc_slurm sft --config examples/reverse_text/sft.toml \
9393 --output-dir outputs/examples/reverse-sft \
9494 --gpus 1 \
9595 --auto-auth \
@@ -106,7 +106,7 @@ For RL we do 20 steps with sequence length 128. All three RL configs in this exa
106106Run RL locally on a single shared GPU (assumes a 24GB GPU like a 3090 or 4090):
107107
108108``` bash
109- medarc_train rl examples/reverse_text/rl_single.toml \
109+ medarc_train rl --config examples/reverse_text/rl_single.toml \
110110 --output-dir outputs/examples/reverse-rl \
111111 --single-gpu \
112112 --wandb.project reverse-text --wandb.name reverse-text-rl
@@ -121,7 +121,7 @@ If you have 2 GPUs, you can dedicate one to inference and one to training. This
121121With ` medarc_train ` :
122122
123123``` bash
124- medarc_train rl examples/reverse_text/rl_multi.toml \
124+ medarc_train rl --config examples/reverse_text/rl_multi.toml \
125125 --output-dir outputs/examples/reverse-rl \
126126 --wandb.project reverse-text --wandb.name reverse-text-rl
127127```
@@ -137,7 +137,7 @@ rl @ examples/reverse_text/rl_multi.toml
137137This example shares a single GPU between the trainer and vLLM inference server. The config lowers vLLM ` gpu_memory_utilization ` so the trainer has headroom — if you still see OOMs, reduce it further.
138138
139139``` bash
140- medarc_slurm rl examples/reverse_text/rl_slurm.toml \
140+ medarc_slurm rl --config examples/reverse_text/rl_slurm.toml \
141141 --output-dir outputs/examples/reverse-rl \
142142 --single-gpu \
143143 --auto-auth \
@@ -147,7 +147,7 @@ medarc_slurm rl examples/reverse_text/rl_slurm.toml \
147147Or preview without submitting:
148148
149149``` bash
150- medarc_slurm rl examples/reverse_text/rl_slurm.toml \
150+ medarc_slurm rl --config examples/reverse_text/rl_slurm.toml \
151151 --output-dir outputs/examples/reverse-rl \
152152 --single-gpu \
153153 --auto-auth \
0 commit comments