Skip to content

DCGM initialization error #910

Open
Open
@minhhoai1001

Description

@minhhoai1001

I run docker on server A100:
docker run -it --rm --gpus all --net=host
-v /var/run/docker.sock:/var/run/docker.sock
-v ${PWD}:/workspace/ --shm-size 8G
nvcr.io/nvidia/tritonserver:22.12-py3-sdk

then I run:
model-analyzer profile --model-repository /workspace/model_repository --profile-models feature_extract --triton-launch-mode=docker --triton-docker-shm-size=8G --output-model-repository-path /workspace/model_optimizer/feature_extract --export-path ./report

I got error:
[Model Analyzer] Initializing GPUDevice handles
CacheManager Init Failed. Error: -17
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 251, in main
gpus = GPUDeviceFactory().verify_requested_gpus(config.gpus)
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/device/gpu_device_factory.py", line 36, in init
self.init_all_devices()
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/device/gpu_device_factory.py", line 55, in init_all_devices
dcgm_handle = dcgm_agent.dcgmStartEmbedded(
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_agent.py", line 41, in dcgmStartEmbedded
dcgm_structs._dcgmCheckReturn(ret)
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 646, in _dcgmCheckReturn
raise DCGMError(ret)
model_analyzer.monitor.dcgm.dcgm_structs.DCGMError_InitError: DCGM initialization error

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions