I want to collect reported benchmark results in this "issue" as a table.
Please feel free to contribute by publishing your results as answers so that I can add it to this table...
Please do not close this issue until we have enough results...
| Device name |
OS Version |
Box version |
Gemma-4-E4B-it (on GPU) |
Gemma-4-E4B-it (on NPU) |
Gemma-4-E4B-it (on CPU) |
Notes |
| Google Pixel 9 Pro XL |
GrapheneOS build 2026042101 (Android 16) |
v1.0.3custom |
Prefill: 120 TPS Decode: 9,54 TPS Time to first token: 2,23 s |
Prefill: 19,31 TPS Decode: 8,11 TPS Time to first token: 13,40 s |
Prefill: 16,74 TPS Decode: 7,40 TPS Time to first token: 15,89 s |
Inference on CPU causes the device to become hot (not observed with GPU and NPU) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Legend:
TPS = tokens/second
Note: This table is maintained via tablesgenerator
I want to collect reported benchmark results in this "issue" as a table.
Please feel free to contribute by publishing your results as answers so that I can add it to this table...
Please do not close this issue until we have enough results...
Decode: 9,54 TPS
Time to first token: 2,23 s
Decode: 8,11 TPS
Time to first token: 13,40 s
Decode: 7,40 TPS
Time to first token: 15,89 s
(not observed with GPU and NPU)
Legend:
TPS = tokens/second
Note: This table is maintained via tablesgenerator