Skip to content

UCX cuda_ipc failed to activate #1195

@Archmilio

Description

@Archmilio

hw = H100 80GB * 8
nvidia driver = 535.86.10

vllm = 0.13.0, 0.11.0
nixl = 0.8.0, 0.6.0
ucx = 1.20.0, 1.19.0

Regardless of the version, I am consistently seeing rc_mlx5 being utilized because cuda_ipc fails to activate

#Docker Setting

docker rm vllm-prefill
docker run -it -d \
    --network host \
    --ipc host \
    --name vllm-prefill \
    --gpus 'all' \
    --shm-size=128GB \
    --ulimit memlock=-1 \
    -v "/mnt/logs:/logs" \
    --privileged \
    --entrypoint sleep \
    aiapi.com/genai/vllm:v0.13.0 infinity

#vLLM ENV / Configuration

export CUDA_VISIBLE_DEVICES=0
export UCX_TLS=cuda_ipc,cuda_copy,sm,self,rc
export UCX_PROTO_INFO=y

export NIXL_LOG_LEVEL=DEBUG
export UCX_LOG_LEVEL=DEBUG
export VLLM_NIXL_SIDE_CHANNEL_PORT=5600
vllm serve /logs/Meta-llama-Llama-3.1-8B-Instruct \
  --port 8100 \
  -tp 1 \
  --gpu-memory-utilization 0.90 \
  --kv-transfer-config '{
        "kv_connector":"NixlConnector",
        "kv_role":"kv_both"
        }'

#The ucx_info settings are as follows

configure: UCX build configuration:
configure:         Build prefix:   /usr/local/ucx
configure:    Configuration dir:   ${prefix}/etc/ucx
configure:                   CC:   gcc
configure:                  CXX:   g++
configure:             CPPFLAGS:   -DCPU_FLAGS="|avx" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure:               CFLAGS:   -O3 -g -Wall -Werror -mavx -funwind-tables -Wframe-larger-than=8192 -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement
configure:             CXXFLAGS:   -O3 -g -Wall -Werror -mavx -funwind-tables -Wframe-larger-than=8192 -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch
configure:           ASAN check:   no
configure:         Multi-thread:   enabled
configure:            MPI tests:   disabled
configure:          VFS support:   no
configure:        Devel headers:   yes
configure: io_demo CUDA support:   no
configure:             Bindings:   < >
configure:          UCS modules:   < >
configure:          UCT modules:   < cuda ib rdmacm cma >
configure:         CUDA modules:   < gdrcopy >
configure:         ROCM modules:   < >
configure:           IB modules:   < mlx5 efa >
configure:          UCM modules:   < cuda >
configure:         Perf modules:   < cuda >
configure: =========================================================

$ ucx_info -v

Library version: 1.20.0
Library path: /lib/libucs.so.0
API headers version: 1.20.0
Git branch '', revision 4b7a6ca
Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --prefix=/usr --with-rdmacm --with-verbs --with-mlx5 --enable-cma --enable-examples --without-java --without-go --with-cuda=/usr/local/cuda --with-xpme

$ ucx_info -d 

Memory domain: cuda_ipc
     Component: cuda_ipc
             register: unlimited, cost: 0 nsec
           remote key: 120 bytes
           memory invalidation is supported
         memory types: cuda (access,reg,cache)

      Transport: cuda_ipc
         Device: cuda
           Type: intra-node
  System device: <unknown>

      capabilities:
            bandwidth: 400000.00/ppn + 0.00 MB/sec
              latency: 1000 nsec
             overhead: 7000 nsec
            put_zcopy: unlimited, up to 1 iov
  put_opt_zcopy_align: <= 1
        put_align_mtu: <= 1
            get_zcopy: unlimited, up to 1 iov
  get_opt_zcopy_align: <= 1
        get_align_mtu: <= 1
           connection: to iface
      device priority: 0
     device num paths: 1
              max eps: inf
       device address: 8 bytes
        iface address: 4 bytes
       error handling: peer failure
   device mem_element: 8 bytes

In the infra-node cfg, rc_mlx5 is being activated for zero copy instead of cuda_ipc

[1767930429.284399] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put* from host memory to host    |
[1767930429.284399] [mngc-001:13245:0]   +--------------------------------+------------+------------------------------------------------+
[1767930429.284399] [mngc-001:13245:0]   |                          0..2K | short      | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284400] [mngc-001:13245:0]   |                      2049..inf | zero-copy  | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284400] [mngc-001:13245:0]   +--------------------------------+------------+------------------------------------------------+
[1767930429.284483] [mngc-001:13245:0]   +--------------------------------+------------------------------------------------------------------------------+
[1767930429.284484] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put*(fast-completion) from host memory to host    |
[1767930429.284484] [mngc-001:13245:0]   +--------------------------------+-----------------------------+------------------------------------------------+
[1767930429.284485] [mngc-001:13245:0]   |                          0..2K | short                       | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284485] [mngc-001:13245:0]   |                     2049..9383 | copy-in                     | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284485] [mngc-001:13245:0]   |                      9384..inf | zero-copy                   | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284486] [mngc-001:13245:0]   +--------------------------------+-----------------------------+------------------------------------------------+
[1767930429.284553] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------------------------------+
[1767930429.284553] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put*(multi) from host memory to host    |
[1767930429.284554] [mngc-001:13245:0]   +--------------------------------+-------------------+------------------------------------------------+
[1767930429.284554] [mngc-001:13245:0]   |                        0..1395 | short             | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284554] [mngc-001:13245:0]   |                      1396..inf | zero-copy         | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284555] [mngc-001:13245:0]   +--------------------------------+-------------------+------------------------------------------------+
[1767930429.284827] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------------------------------+
[1767930429.284828] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put* from host memory to cuda/dev[0]    |
[1767930429.284828] [mngc-001:13245:0]   +--------------------------------+-------------------+------------------------------------------------+
[1767930429.284828] [mngc-001:13245:0]   |                          0..2K | short             | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284829] [mngc-001:13245:0]   |                      2049..inf | zero-copy         | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284829] [mngc-001:13245:0]   +--------------------------------+-------------------+------------------------------------------------+
[1767930429.284905] [mngc-001:13245:0]   +--------------------------------+-------------------------------------------------------------------------------------+
[1767930429.284906] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put*(fast-completion) from host memory to cuda/dev[0]    |
[1767930429.284906] [mngc-001:13245:0]   +--------------------------------+------------------------------------+------------------------------------------------+
[1767930429.284906] [mngc-001:13245:0]   |                          0..2K | short                              | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284907] [mngc-001:13245:0]   |                     2049..9383 | copy-in                            | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284907] [mngc-001:13245:0]   |                      9384..inf | zero-copy                          | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284908] [mngc-001:13245:0]   +--------------------------------+------------------------------------+------------------------------------------------+
[1767930429.284982] [mngc-001:13245:0]   +--------------------------------+---------------------------------------------------------------------------+
[1767930429.284983] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | remote memory write by ucp_put*(multi) from host memory to cuda/dev[0]    |
[1767930429.284983] [mngc-001:13245:0]   +--------------------------------+--------------------------+------------------------------------------------+
[1767930429.284984] [mngc-001:13245:0]   |                        0..1395 | short                    | rc_mlx5/mlx5_0:1/path0                         |
[1767930429.284984] [mngc-001:13245:0]   |                      1396..inf | zero-copy                | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
[1767930429.284984] [mngc-001:13245:0]   +--------------------------------+--------------------------+------------------------------------------------+
[1767930429.285191] [mngc-001:13245:0]   +--------------------------------+----------------------------------------------------+
[1767930429.285191] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send* from host memory    |
[1767930429.285191] [mngc-001:13245:0]   +--------------------------------+---------------------------+------------------------+
[1767930429.285192] [mngc-001:13245:0]   |                        0..2038 | short                     | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285192] [mngc-001:13245:0]   |                     2039..8184 | zero-copy                 | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285192] [mngc-001:13245:0]   |                      8185..inf | multi-frag zero-copy      | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285193] [mngc-001:13245:0]   +--------------------------------+---------------------------+------------------------+
[1767930429.285318] [mngc-001:13245:0]   +--------------------------------+---------------------------------------------------------------------+
[1767930429.285319] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send*(fast-completion) from host memory    |
[1767930429.285319] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------+------------------------+
[1767930429.285319] [mngc-001:13245:0]   |                        0..2038 | short                                      | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285320] [mngc-001:13245:0]   |                     2039..8184 | copy-in                                    | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285320] [mngc-001:13245:0]   |                     8185..9279 | multi-frag copy-in                         | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285320] [mngc-001:13245:0]   |                      9280..inf | multi-frag zero-copy                       | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285321] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------+------------------------+
[1767930429.285805] [mngc-001:13245:0]   +--------------------------------+-----------------------------------------------------------+
[1767930429.285806] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send*(multi) from host memory    |
[1767930429.285806] [mngc-001:13245:0]   +--------------------------------+----------------------------------+------------------------+
[1767930429.285806] [mngc-001:13245:0]   |                         0..514 | short                            | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285807] [mngc-001:13245:0]   |                      515..8184 | zero-copy                        | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285807] [mngc-001:13245:0]   |                      8185..inf | multi-frag zero-copy             | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285807] [mngc-001:13245:0]   +--------------------------------+----------------------------------+------------------------+
[1767930429.285933] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------------------------------+
[1767930429.285933] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send* with reply flag from host memory    |
[1767930429.285934] [mngc-001:13245:0]   +--------------------------------+-------------------------------------------+------------------------+
[1767930429.285934] [mngc-001:13245:0]   |                        0..2030 | short                                     | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285934] [mngc-001:13245:0]   |                     2031..8176 | zero-copy                                 | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285935] [mngc-001:13245:0]   |                      8177..inf | multi-frag zero-copy                      | rc_mlx5/mlx5_0:1/path0 |
[1767930429.285935] [mngc-001:13245:0]   +--------------------------------+-------------------------------------------+------------------------+
[1767930429.286063] [mngc-001:13245:0]   +--------------------------------+-------------------------------------------------------------------------------------+
[1767930429.286063] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send* with reply flag(fast-completion) from host memory    |
[1767930429.286064] [mngc-001:13245:0]   +--------------------------------+------------------------------------------------------------+------------------------+
[1767930429.286064] [mngc-001:13245:0]   |                        0..2030 | short                                                      | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286064] [mngc-001:13245:0]   |                     2031..8176 | copy-in                                                    | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286065] [mngc-001:13245:0]   |                     8177..9279 | multi-frag copy-in                                         | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286065] [mngc-001:13245:0]   |                      9280..inf | multi-frag zero-copy                                       | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286065] [mngc-001:13245:0]   +--------------------------------+------------------------------------------------------------+------------------------+
[1767930429.286215] [mngc-001:13245:0]   +--------------------------------+---------------------------------------------------------------------------+
[1767930429.286216] [mngc-001:13245:0]   | ucp_context_0 intra-node cfg#1 | active message by ucp_am_send* with reply flag(multi) from host memory    |
[1767930429.286216] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------------+------------------------+
[1767930429.286216] [mngc-001:13245:0]   |                         0..514 | short                                            | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286217] [mngc-001:13245:0]   |                      515..8176 | zero-copy                                        | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286217] [mngc-001:13245:0]   |                      8177..inf | multi-frag zero-copy                             | rc_mlx5/mlx5_0:1/path0 |
[1767930429.286220] [mngc-001:13245:0]   +--------------------------------+--------------------------------------------------+------------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions