Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0')

I am running the MLPerf inference benchmark for the Llama2-70b-99 model on a cluster with 6 MI210 GPUs. Below is the command I am using with CM:

`cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev    --model=llama2-70b-99    --implementation=reference    --framework=pytorch    --category=datacenter    --scenario=Offline    --execution_mode=test    --device=rocm    --quiet    --test_query_count=10    --env.LLAMA2_CHECKPOINT_PATH=/home/intern01/Llama-2-70b-chat-hf`

When I try to run the script with the --device rocm option, I get the error message above. It seems that rocm is not recognized as a valid device option, as the script only accepts cpu or cuda:0. This is the full message `CM script::benchmark-program/run.sh

```
Run Directory: /home/intern01/CM/repos/local/cache/12bee67ce1d840d4/inference/language/llama2-70b

CMD: /home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo \${PIPESTATUS[0]} > exitstatus

INFO:root:         ! cd /home/intern01/CM/repos/local/cache/dd75d90466a24ac1
INFO:root:         ! call /home/intern01/CM/repos/mlcommons@mlperf-automations/script/benchmark-program/run.sh from tmp-run.sh

/home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo ${PIPESTATUS[0]} > exitstatus
usage: main.py [-h] [--scenario {Offline,Server}] [--model-path MODEL_PATH]
               [--dataset-path DATASET_PATH] [--accuracy] [--dtype DTYPE]
               [--device {cpu,cuda:0}] [--audit-conf AUDIT_CONF]
               [--user-conf USER_CONF]
               [--total-sample-count TOTAL_SAMPLE_COUNT]
               [--batch-size BATCH_SIZE] [--output-log-dir OUTPUT_LOG_DIR]
               [--enable-log-trace] [--num-workers NUM_WORKERS] [--vllm]
               [--api-model-name API_MODEL_NAME] [--api-server API_SERVER]
main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0')

CM error: Portable CM script failed (name = benchmark-program, return code = 512)
```

Could you please advise on how to enable or fix the rocm support for this benchmark? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions