Skip to content

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

@altairBASIC

Description

@altairBASIC

I am running the MLPerf inference benchmark for the Llama2-70b-99 model on a cluster with 6 MI210 GPUs. Below is the command I am using with CM:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev --model=llama2-70b-99 --implementation=reference --framework=pytorch --category=datacenter --scenario=Offline --execution_mode=test --device=rocm --quiet --test_query_count=10 --env.LLAMA2_CHECKPOINT_PATH=/home/intern01/Llama-2-70b-chat-hf

When I try to run the script with the --device rocm option, I get the error message above. It seems that rocm is not recognized as a valid device option, as the script only accepts cpu or cuda:0. This is the full message `CM script::benchmark-program/run.sh

Run Directory: /home/intern01/CM/repos/local/cache/12bee67ce1d840d4/inference/language/llama2-70b

CMD: /home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo \${PIPESTATUS[0]} > exitstatus

INFO:root:         ! cd /home/intern01/CM/repos/local/cache/dd75d90466a24ac1
INFO:root:         ! call /home/intern01/CM/repos/mlcommons@mlperf-automations/script/benchmark-program/run.sh from tmp-run.sh

/home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo ${PIPESTATUS[0]} > exitstatus
usage: main.py [-h] [--scenario {Offline,Server}] [--model-path MODEL_PATH]
               [--dataset-path DATASET_PATH] [--accuracy] [--dtype DTYPE]
               [--device {cpu,cuda:0}] [--audit-conf AUDIT_CONF]
               [--user-conf USER_CONF]
               [--total-sample-count TOTAL_SAMPLE_COUNT]
               [--batch-size BATCH_SIZE] [--output-log-dir OUTPUT_LOG_DIR]
               [--enable-log-trace] [--num-workers NUM_WORKERS] [--vllm]
               [--api-model-name API_MODEL_NAME] [--api-server API_SERVER]
main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0')

CM error: Portable CM script failed (name = benchmark-program, return code = 512)

Could you please advise on how to enable or fix the rocm support for this benchmark? Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions