Add Qwen2 Support #16

jlonge4 · 2025-04-09T20:31:02Z

Issue #, if available:
aws-neuron/neuron-workshops#7

Description of changes:
This PR adds support for Qwen2 Models

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

FThompsonAWS · 2025-04-29T20:31:45Z

Thank you for your contribution! We are defining our process to review/accept contributions and will get back to you soon.

jlonge4 · 2025-05-14T13:52:50Z

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen2 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen2 \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen2/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --top-k 1 \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 152064])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 152064])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune
Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 556.2845468521118,
        "latency_ms_p90": 556.8541049957275,
        "latency_ms_p95": 556.9546341896057,
        "latency_ms_p99": 557.6868557929993,
        "latency_ms_p100": 557.8699111938477,
        "latency_ms_avg": 555.8973550796509,
        "throughput": 57.56458401464229
    },
    "context_encoding_model": {
        "latency_ms_p50": 41.66257381439209,
        "latency_ms_p90": 41.74163341522217,
        "latency_ms_p95": 41.74485206604004,
        "latency_ms_p99": 41.75789833068848,
        "latency_ms_p100": 41.761159896850586,
        "latency_ms_avg": 41.67191982269287,
        "throughput": 383.9515930170089
    },
    "token_generation_model": {
        "latency_ms_p50": 33.31899642944336,
        "latency_ms_p90": 33.50291252136231,
        "latency_ms_p95": 33.71169567108154,
        "latency_ms_p99": 33.97499084472656,
        "latency_ms_p100": 34.55924987792969,
        "latency_ms_avg": 33.36369752883911,
        "throughput": 31.970876901298336
    }
}

jlonge4 and others added 5 commits April 9, 2025 16:18

add support for phi3 models

c782b23

add clone().detach() to convert_to_neuron function

3765ca4

Update for 2.21

e2f42e9

handle GQA in convert_state_dict

f831564

add qwen2 support

fa8759e

update for latest release

860fb3d

jlonge4 closed this Jul 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen2 Support #16

Add Qwen2 Support #16

Uh oh!

jlonge4 commented Apr 9, 2025

Uh oh!

FThompsonAWS commented Apr 29, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Qwen2 Support #16

Add Qwen2 Support #16

Uh oh!

Conversation

jlonge4 commented Apr 9, 2025

Uh oh!

FThompsonAWS commented Apr 29, 2025

Uh oh!

jlonge4 commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants