Skip to content

Conversation

@jlonge4
Copy link

@jlonge4 jlonge4 commented Apr 9, 2025

Issue #, if available:
aws-neuron/neuron-workshops#7

Description of changes:
This PR adds support for Qwen2 Models

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@FThompsonAWS
Copy link
Contributor

Thank you for your contribution! We are defining our process to review/accept contributions and will get back to you soon.

@jlonge4
Copy link
Author

jlonge4 commented May 14, 2025

Logit Validation Benchmark Code:

!inference_demo \
    --model-type qwen2 \
    --task-type causal-lm \
    run \
    --model-path /home/ubuntu/model_hf_qwen/qwen2 \
    --compiled-model-path /home/ubuntu/traced_model_qwen/qwen2/logit \
    --torch-dtype bfloat16 \
    --tp-degree 8 \
    --batch-size 1 \
    --max-context-length 16 \
    --seq-len 32 \
    --top-k 1 \
    --pad-token-id 151645 \
    --prompt "To be, or not to be" \
    --check-accuracy-mode logit-matching \
    --benchmark

Results:

Expected Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Expected Logits Shape:  torch.Size([25, 1, 152064])
Actual Output:  [", that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune"] tensor([[   11,   429,   374,   279,  3405,    13, 13139,   364,    83,   285,
         13049,  1536,   304,   279,  3971,   311,  7676,   279,  1739,   819,
           323, 36957,   315, 54488, 32315]])
Actual Logits Shape:  torch.Size([25, 1, 152064])
Passed logits validation!

Generating outputs...
Prompts: ['To be, or not to be']
Generated outputs:
Output 0: To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune
Benchmark completed and its result is as following
{
    "e2e_model": {
        "latency_ms_p50": 556.2845468521118,
        "latency_ms_p90": 556.8541049957275,
        "latency_ms_p95": 556.9546341896057,
        "latency_ms_p99": 557.6868557929993,
        "latency_ms_p100": 557.8699111938477,
        "latency_ms_avg": 555.8973550796509,
        "throughput": 57.56458401464229
    },
    "context_encoding_model": {
        "latency_ms_p50": 41.66257381439209,
        "latency_ms_p90": 41.74163341522217,
        "latency_ms_p95": 41.74485206604004,
        "latency_ms_p99": 41.75789833068848,
        "latency_ms_p100": 41.761159896850586,
        "latency_ms_avg": 41.67191982269287,
        "throughput": 383.9515930170089
    },
    "token_generation_model": {
        "latency_ms_p50": 33.31899642944336,
        "latency_ms_p90": 33.50291252136231,
        "latency_ms_p95": 33.71169567108154,
        "latency_ms_p99": 33.97499084472656,
        "latency_ms_p100": 34.55924987792969,
        "latency_ms_avg": 33.36369752883911,
        "throughput": 31.970876901298336
    }
}

@jlonge4 jlonge4 closed this Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants