Benchmarking

This guide focuses on using the application's own benchmarking and logging to evaluate performance.

What to Measure

Use consistent metrics so results are comparable across devices and builds.

Metric	Definition
End-to-end latency	Time from 'Press to talk' button click to first spoken response
STT latency	Audio end to transcription complete
LLM TTFT	Time to first token from LLM
LLM TPOT	Average time per output token
TTS latency	Response ready to audio playback
Throughput	Tokens/sec (LLM) or sec/sec (STT)

Use the app's built-in logging/timing output to capture the metrics listed above.
Keep prompts and audio files fixed between runs.
Avoid running with a debugger attached.
Prefer release builds for performance numbers.

Use the benchmark screens in the app to run repeatable tests and capture the results.

Capture the following in a short report or issue comment:

Device model + OS version
Build type + relevant flags. e.g. -PkleidiAI=false to benchmark with kleidiAI off
Model names and sizes
Prompt / audio used
Table of metrics with averages and sample size

Comparing debug vs. release builds
Changing model or prompt between runs
Measuring first run only (cold start)
Running on a thermal-throttled device
Mixing runs with and without KleidiAI or SME enabled, which can significantly affect performance results