Skip to content

Benchmark collection for different AI models and devices #16

@aryoda

Description

@aryoda

I want to collect reported benchmark results in this "issue" as a table.

Please feel free to contribute by publishing your results as answers so that I can add it to this table...

Please do not close this issue until we have enough results...

Device name OS Version Box version Gemma-4-E4B-it (on GPU) Gemma-4-E4B-it (on NPU) Gemma-4-E4B-it (on CPU) Notes
Google Pixel 9 Pro XL GrapheneOS build 2026042101 (Android 16) v1.0.3custom Prefill: 120 TPS
Decode: 9,54 TPS
Time to first token: 2,23 s
Prefill: 19,31 TPS
Decode: 8,11 TPS
Time to first token: 13,40 s
Prefill: 16,74 TPS
Decode: 7,40 TPS
Time to first token: 15,89 s
Inference on CPU causes the device to become hot
(not observed with GPU and NPU)

Legend:
TPS = tokens/second

Note: This table is maintained via tablesgenerator

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions