Skip to content

[Issue]: Running DeepSeek V3.2 on MI308x*8 with Atom-vLLM only 200-300K TPM in single-machine performance #896

@LoadingZhang

Description

@LoadingZhang

Problem Description

Input/Output token length: 8K/1K, with kv cache hit rate: 80+%
running command:

vllm serve $MODEL_PATH  --tensor-parallel-size 8 --kv-cache-dtype fp8 --gpu_memory_utilization 0.9 --async-scheduling --max-num-seqs 48 --tool-call-parser deepseek_v32 --enable-auto-tool-choice --reasoning-parser deepseek_v3 --max-model-len 32K

Are specialized optimizations required for the MI308x?

Operating System

CentOS 8

CPU

AMD EPYC 9K84 96-Core Processor

GPU

AMD MI308*8

ROCm Version

722

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions