Description
Hi @arjunsuresh I have encountered one issue for the docker migration.
I run the below command in the system A to build the docker successfully and run the Resnet50 inference in the docker successfully. Then I save the docker as the docker-with-test-successfully-1.tar.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev
--model=resnet50
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=test
--device=cuda
--docker --quiet
--test_query_count=1000
I loaded it on another system B with almost the same configuration and run the same Resnet50 inference again as below but failed with the below log. I'm not sure is there any limitation for the docker migration I should care about.
bob@Bob-Tomcat-Product:~$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker-with-test-successfully-1 latest 2b63d4ccc258 9 days ago 35.5GB bob@Bob-Tomcat-Product:~$ docker run -it docker-with-test-successfully-1:latest /bin/bash cmuser@d37b940a1f0a:~$ ls CM cm-run-script-versions.json configs hardware version_info.json cmuser@d37b940a1f0a:~$ cm run script --tags=run-mlperf,inference,_r4.1-dev \ > --model=resnet50 \ > --implementation=nvidia \ > --framework=tensorrt \ > --category=edge \ > --scenario=Offline \ > --execution_mode=valid \ > --device=cuda \ > --division=closed \ > --rerun \ > --quiet INFO:root:* cm run script "run-mlperf inference _r4.1-dev" INFO:root: * cm run script "detect os" INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: * cm run script "detect cpu" INFO:root: * cm run script "detect os" INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: * cm run script "get mlcommons inference src" INFO:root: ! load /home/cmuser/CM/repos/local/cache/21f79a83541549b7/cm-cached-state.json INFO:root: * cm run script "get sut description" INFO:root: * cm run script "detect os" INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: * cm run script "detect cpu" INFO:root: * cm run script "detect os" INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh INFO:root: ! call "postprocess" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: * cm run script "get compiler" INFO:root: ! load /home/cmuser/CM/repos/local/cache/6285b87ff0f74d8a/cm-cached-state.json INFO:root: * cm run script "get cuda-devices _with-pycuda" INFO:root: * cm run script "get cuda _toolkit" INFO:root: ! load /home/cmuser/CM/repos/local/cache/b5a3a8af88c14cc7/cm-cached-state.json INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no INFO:root:ENV[CM_CUDA_VERSION]: 12.2 INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu122 INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda/bin/nvcc INFO:root:ENV[CUDA_HOME]: /usr/local/cuda INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: * cm run script "get generic-python-lib _package.pycuda" INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh INFO:root: ! call "detect_version" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/customize.py Detected version: 2022.2.2 INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: ! load /home/cmuser/CM/repos/local/cache/a29ea6efe3564a4b/cm-cached-state.json INFO:root: * cm run script "get generic-python-lib _package.numpy" INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/validate_cache.sh from tmp-run.sh INFO:root: ! call "detect_version" from /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-generic-python-lib/customize.py Detected version: 1.23.5 INFO:root: * cm run script "get python3" INFO:root: ! load /home/cmuser/CM/repos/local/cache/bba8cf8097b64518/cm-cached-state.json INFO:root:Path to Python: /usr/bin/python3 INFO:root:Python version: 3.8.10 INFO:root: ! load /home/cmuser/CM/repos/local/cache/19ca7b3b57a74cd2/cm-cached-state.json INFO:root: ! cd /home/cmuser INFO:root: ! call /home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.sh from tmp-run.sh Traceback (most recent call last): File "/home/cmuser/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.py", line 1, in <module> import pycuda.driver as cuda File "/home/cmuser/.local/lib/python3.8/site-packages/pycuda/driver.py", line 66, in <module> from pycuda._driver import * # noqa ImportError: /lib/x86_64-linux-gnu/libcuda.so.1: file too short CM error: Portable CM script failed (name = get-cuda-devices, return code = 256) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note that it is often a portability issue of a third-party tool or a native script wrapped and unified by this CM script (automation recipe). Please re-run this script with --repro flag and report this issue with the original command line, cm-repro directory and full log here: https://github.com/mlcommons/cm4mlops/issues The CM concept is to collaboratively fix such issues inside portable CM scripts to make existing tools and native scripts more portable, interoperable and deterministic. Thank you! cmuser@d37b940a1f0a:~$