Qwen3 model #14

jlonge4 · 2025-04-30T18:17:06Z

Issue #, if available:
N/A
Description of changes:
Add Qwen3 model file and inference notebook. Tested with Qwen/Qwen3-8B

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jlonge4 · 2025-05-14T14:26:04Z

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen3 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen/ \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --enable-bucketing \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 151936])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 151936])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune

Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 156.56781196594238,
        "latency_ms_p90": 158.08086395263672,
        "latency_ms_p95": 158.1140637397766,
        "latency_ms_p99": 158.28602075576782,
        "latency_ms_p100": 158.32901000976562,
        "latency_ms_avg": 156.99772834777832,
        "throughput": 203.82460521412273
    },
    "context_encoding_model": {
        "latency_ms_p50": 10.202646255493164,
        "latency_ms_p90": 10.224390029907227,
        "latency_ms_p95": 10.22493839263916,
        "latency_ms_p99": 10.226750373840332,
        "latency_ms_p100": 10.227203369140625,
        "latency_ms_avg": 10.201811790466309,
        "throughput": 1568.348870634151
    },
    "token_generation_model": {
        "latency_ms_p50": 8.858323097229004,
        "latency_ms_p90": 8.903312683105469,
        "latency_ms_p95": 9.238588809967041,
        "latency_ms_p99": 9.264287948608398,
        "latency_ms_p100": 9.28950309753418,
        "latency_ms_avg": 8.88296922047933,
        "throughput": 120.07996877975322
    }
}

jlonge4 added 7 commits April 9, 2025 16:38

add qwen2 support

7b3ae19

update qwen file and add test nb

c6b43cf

add qwen3

0eaad5c

lint

e1611a1

add inference nb

176ced2

Remove .DS_Store files and add to gitignore

4d683ea

logit val / cleanup

beabb8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3 model #14

Qwen3 model #14

jlonge4 commented Apr 30, 2025

jlonge4 commented May 14, 2025

Qwen3 model #14

Are you sure you want to change the base?

Qwen3 model #14

Conversation

jlonge4 commented Apr 30, 2025

jlonge4 commented May 14, 2025