wrong with --triton-launch-mode=remote

### **Problem:**
When using model-analyzer with --triton-launch-mode=remoted, I encounter connectivity issues.

### **Context:**
I have successfully started Triton Inference Server on the same server, loaded the model add, and verified functionality by testing inference requests and monitoring endpoints within the Triton SDK container. However, when attempting to run performance analysis using model-analyzer, I receive an error indicating inability to connect to Triton Server's GPU metrics monitor.

### **Steps to Reproduce:**

#### 1. Start Triton Server:

Version: 23.10
Loaded Model: add
`docker run -it --gpus all --privileged -p 8000:8000 -p 8001:8001 -p 8002:8002  --rm  --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 -v/data/ti-platform/xury/triton_docker_test_file/model_analyzer_test/model_analyzer-main/examples/bak:/models nvcr.io/nvidia/tritonserver:23.10-vllm-python-py3 /bin/bash`

`tritonserver --model-repository=/models --model-control-mode explicit --load-model add`


#### 2. Start Triton SDK container:
`docker run --gpus all -ti -v /var/run/docker.sock:/var/run/docker.sock  --net=host --privileged --rm -v /data/reports:/data/reports nvcr.io/nvidia/tritonserver:23.10-py3-sdk bash`


#### 3. Test inference request in SDK container:
`curl -X POST http://localhost:8000/v2/models/add/infer -H "Content-Type: application/json" -d '{
    "inputs": [
        {"name": "INPUT0", "datatype": "FP32", "shape": [4], "data": [1.0, 2.0, 3.0, 4.0]},
        {"name": "INPUT1", "datatype": "FP32", "shape": [4], "data": [5.0, 6.0, 7.0, 8.0]}
    ]
}'`

Successful response received.
`{"model_name":"add","model_version":"1","outputs":[{"name":"OUTPUT","datatype":"FP32","shape":[4],"data":[6.0,8.0,10.0,12.0]}]}r`

#### 4. Test Triton Server metrics endpoint in SDK container:

`curl http://localhost:8002/metrics`

Successful response received.

#### 5. Attempt to run model-analyzer for performance profiling:

`model-analyzer profile --profile-models add --triton-launch-mode=remote --output-model-repository-path /data/reports/add --export-path profile_results --triton-http-endpoint localhost:8000 --triton-metrics-url http://localhost:8002/metrics --run-config-search-max-concurrency 2 --run-config-search-max-model-batch-size 2 --run-config-search-max-instance-count 2 --override-output-model-repository`

Error encountered:
``Traceback (most recent call last):
  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/entrypoint.py", line 278, in main
    analyzer.profile(
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 123, in profile
    self._get_server_only_metrics(client, gpus)
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 224, in _get_server_only_metrics
    self._metrics_manager.profile_server()
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/record/metrics_manager.py", line 188, in profile_server
    self._start_monitors(capture_gpu_metrics=capture_gpu_metrics)
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/record/metrics_manager.py", line 488, in _start_monitors
    raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Failed to connect to Tritonserver's GPU metrics monitor. Please check that the `triton_metrics_url` value is set correctly: http://localhost:8002/metrics.
``


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wrong with --triton-launch-mode=remote #908

Problem:

Context:

Steps to Reproduce:

1. Start Triton Server:

2. Start Triton SDK container:

3. Test inference request in SDK container:

4. Test Triton Server metrics endpoint in SDK container:

5. Attempt to run model-analyzer for performance profiling:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

wrong with --triton-launch-mode=remote #908

Description

Problem:

Context:

Steps to Reproduce:

1. Start Triton Server:

2. Start Triton SDK container:

3. Test inference request in SDK container:

4. Test Triton Server metrics endpoint in SDK container:

5. Attempt to run model-analyzer for performance profiling:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions