Skip to content

Commit b5304e7

Browse files
committed
Update README.md to replace Rclone download instructions with R2 Downloader.
1 parent 9939f13 commit b5304e7

File tree

1 file changed

+5
-14
lines changed

1 file changed

+5
-14
lines changed

retired_benchmarks/gpt3/megatron-lm/README.md

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -164,30 +164,21 @@ Evaluation on the validation subset that consists of 24567 examples.
164164
# 6. Other
165165

166166
### S3 artifacts download
167-
The dataset and the checkpoints are available to download from an S3 bucket. You can download this data from the bucket using Rclone as follows:
167+
The dataset and the checkpoints are available to download from an S3-compatible bucket. You can download this data from the bucket using the MLCommons R2 Downloader. More information about the MLCommons R2 Downloader, including how to run it on Windows and in the dedicated container image, can be found [here](https://training.mlcommons-storage.org).
168168

169-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
170-
To install Rclone on Linux/macOS/BSD systems, run:
171-
```
172-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
173-
```
174-
Once Rclone is installed, run the following command to authenticate with the bucket:
175-
```
176-
rclone config create mlc-training s3 provider=Cloudflare access_key_id=76ea42eadb867e854061a1806220ee1e secret_access_key=a53625c4d45e3ca8ac0df8a353ea3a41ffc3292aa25259addd8b7dc5a6ce2936 endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
177-
```
178-
You can then navigate in the terminal to your desired download directory and run the following commands to download the dataset and checkpoints:
169+
Navigate in the terminal to your desired download directory and run the following commands to download the dataset and checkpoints:
179170

180171
**`dataset_c4_spm.tar`**
181172
```
182-
rclone copy mlc-training:mlcommons-training-wg-public/gpt3/megatron-lm/dataset_c4_spm.tar ./ -P
173+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://training.mlcommons-storage.org/metadata/gpt-3-megatron-preprocessed-dataset.uri
183174
```
184175
**`checkpoint_megatron_fp32.tar`**
185176
```
186-
rclone copy mlc-training:mlcommons-training-wg-public/gpt3/megatron-lm/checkpoint_megatron_fp32.tar ./ -P
177+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://training.mlcommons-storage.org/metadata/gpt-3-megatron-fp32-checkpoint.uri
187178
```
188179
**`checkpoint_nemo_bf16`**
189180
```
190-
rclone copy mlc-training:mlcommons-training-wg-public/gpt3/megatron-lm/checkpoint_nemo_bf16.tar ./ -P
181+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) https://training.mlcommons-storage.org/metadata/gpt-3-megatron-bf16-checkpoint.uri
191182
```
192183

193184
### Model conversion from Paxml checkpoints

0 commit comments

Comments
 (0)