Skip to content

Commit 10e4847

Browse files
authored
Merge branch 'mlcommons:dev' into dev
2 parents f89cf2c + b616fe2 commit 10e4847

2 files changed

Lines changed: 87 additions & 0 deletions

File tree

script/get-dataset-cnndm/README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# README for get-dataset-cnndm
2+
This README is automatically generated. Add custom content in [info.md](info.md). Please follow the [script execution document](https://docs.mlcommons.org/mlcflow/targets/script/execution-flow/) to understand more about the MLC script execution.
3+
4+
`mlcflow` stores all local data under `$HOME/MLC` by default. So, if there is space constraint on the home directory and you have more space on say `/mnt/$USER`, you can do
5+
```
6+
mkdir /mnt/$USER/MLC
7+
ln -s /mnt/$USER/MLC $HOME/MLC
8+
```
9+
You can also use the `ENV` variable `MLC_REPOS` to control this location but this will need a set after every system reboot.
10+
11+
## Setup
12+
13+
If you are not on a Python development environment please refer to the [official docs](https://docs.mlcommons.org/mlcflow/install/) for the installation.
14+
15+
```bash
16+
python3 -m venv mlcflow
17+
. mlcflow/bin/activate
18+
pip install mlcflow
19+
```
20+
21+
- Using a virtual environment is recommended (per `pip` best practices), but you may skip it or use `--break-system-packages` if needed.
22+
23+
### Pull mlperf-automations
24+
25+
Once `mlcflow` is installed:
26+
27+
```bash
28+
mlc pull repo mlcommons@mlperf-automations --pat=<Your Private Access Token>
29+
```
30+
- `--pat` or `--ssh` is only needed if the repo is PRIVATE
31+
- If `--pat` is avoided, you'll be asked to enter the password where you can enter your Private Access Token
32+
- `--ssh` option can be used instead of `--pat=<>` option if you prefer to use SSH for accessing the github repository.
33+
## Run Commands
34+
35+
```bash
36+
mlcr get,dataset,gpt-j,cnndm,cnn-dailymail,original
37+
```
38+
39+
No script specific inputs
40+
### Generic Script Inputs
41+
42+
| Name | Description | Choices | Default |
43+
|------|-------------|---------|------|
44+
| `--input` | Input to the script passed using the env key `MLC_INPUT` | | `` |
45+
| `--output` | Output from the script passed using the env key `MLC_OUTPUT` | | `` |
46+
| `--outdirname` | The directory to store the script output | | `cache directory ($HOME/MLC/repos/local/cache/<>) if the script is cacheable or else the current directory` |
47+
| `--outbasename` | The output file/folder name | | `` |
48+
| `--name` | | | `` |
49+
| `--extra_cache_tags` | Extra cache tags to be added to the cached entry when the script results are saved | | `` |
50+
| `--skip_compile` | Skip compilation | | `False` |
51+
| `--skip_run` | Skip run | | `False` |
52+
| `--accept_license` | Accept the required license requirement to run the script | | `False` |
53+
| `--skip_system_deps` | Skip installing any system dependencies | | `False` |
54+
| `--git_ssh` | Use SSH for git repos | | `False` |
55+
| `--gh_token` | Github Token | | `` |
56+
| `--hf_token` | Huggingface Token | | `` |
57+
| `--verify_ssl` | Verify SSL | | `False` |
58+
## Variations
59+
60+
### Category
61+
62+
- `datacenter`
63+
- `edge`
64+
65+
### Dataset-type
66+
67+
- `calibration`
68+
- `validation` (default)
69+
70+
### Download-src
71+
72+
- `mlc`
73+
74+
### Download-tool
75+
76+
- `r2-downloader`
77+
- `rclone`
78+
79+
### Run-mode
80+
81+
- `dry-run`
82+
83+
### Ungrouped
84+
85+
- `intel`
86+
- `llama3`

script/get-dataset-cnndm/meta.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ deps:
1212
- python3
1313
tags: get,python3
1414
version_max: 3.9.999
15+
version_max_usable: 3.9.12
1516
skip_if_env:
1617
MLC_TMP_ML_MODEL:
1718
- llama3_1-8b

0 commit comments

Comments
 (0)