You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AIPerf provides comprehensive metrics organized into multiple functional categories. For detailed descriptions, requirements, and nuances of each metric, see the **[Complete Metrics Reference](docs/metrics_reference.md)**.
197
+
198
+
### Streaming Metrics
199
+
200
+
Metrics specific to streaming requests that measure real-time token generation characteristics. Requires `--streaming` flag.
201
+
202
+
| Metric | Tag | Formula | Unit |
203
+
|--------|-----|---------|------|
204
+
|[**Time to First Token (TTFT)**](docs/metrics_reference.md#time-to-first-token-ttft)|`time_to_first_token`|`content_responses[0].perf_ns - request.start_perf_ns`|`ms`|
205
+
|[**Time to Second Token (TTST)**](docs/metrics_reference.md#time-to-second-token-ttst)|`time_to_second_token`|`content_responses[1].perf_ns - content_responses[0].perf_ns`|`ms`|
|[**Inter Chunk Latency (ICL)**](docs/metrics_reference.md#inter-chunk-latency-icl)|`inter_chunk_latency`|`[content_responses[i].perf_ns - content_responses[i-1].perf_ns for i in range(1, len(content_responses))]`|`ms`|
208
+
|[**Output Token Throughput Per User**](docs/metrics_reference.md#output-token-throughput-per-user)|`output_token_throughput_per_user`|`1.0 / inter_token_latency_seconds`|`tokens/sec/user`|
209
+
|[**Time to First Output Token (TTFO)**](docs/metrics_reference.md#time-to-first-output-token-ttfo)|`time_to_first_output_token`|`first_non_reasoning_token_perf_ns - request.start_perf_ns`|`ms`|
210
+
|[**Prefill Throughput Per User**](docs/metrics_reference.md#prefill-throughput-per-user)|`prefill_throughput_per_user`|`input_sequence_length / time_to_first_token_seconds`|`tokens/sec/user`|
211
+
212
+
### Token Based Metrics
213
+
214
+
Metrics for token-producing endpoints that track token counts and throughput. Requires text-generating endpoints (chat, completion, etc.).
|[**Output Sequence Length (OSL)**](docs/metrics_reference.md#output-sequence-length-osl)|`output_sequence_length`|`(output_token_count or 0) + (reasoning_token_count or 0)`|`tokens`|
|[**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens)|`total_output_tokens`|`sum(r.output_token_count for r in records if r.valid)`|`tokens`|
222
+
|[**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length)|`total_osl`|`sum(r.output_sequence_length for r in records if r.valid)`|`tokens`|
223
+
|[**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length)|`total_isl`|`sum(r.input_sequence_length for r in records if r.valid)`|`tokens`|
|[**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens)|`total_reasoning_tokens`|`sum(r.reasoning_token_count for r in records if r.valid)`|`tokens`|
244
+
245
+
### Usage Field Metrics
246
+
247
+
Metrics tracking API-reported token counts from the `usage` field in responses. Useful for comparing client-side vs server-side token counts.
|[**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens)|`total_usage_prompt_tokens`|`sum(r.usage_prompt_tokens for r in records if r.valid)`|`tokens`|
256
+
|[**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens)|`total_usage_completion_tokens`|`sum(r.usage_completion_tokens for r in records if r.valid)`|`tokens`|
257
+
|[**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens)|`total_usage_total_tokens`|`sum(r.usage_total_tokens for r in records if r.valid)`|`tokens`|
258
+
259
+
### Usage Discrepancy Metrics
260
+
261
+
Metrics measuring differences between API-reported and client-computed token counts.
|[**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count)|`usage_discrepancy_count`|`sum(1 for r in records if r.any_diff > threshold)`|`requests`|
269
+
270
+
### Goodput Metrics
271
+
272
+
Metrics measuring throughput of requests meeting user-defined Service Level Objectives (SLOs).
273
+
274
+
| Metric | Tag | Formula | Unit |
275
+
|--------|-----|---------|------|
276
+
|[**Good Request Count**](docs/metrics_reference.md#good-request-count)|`good_request_count`|`sum(1 for r in records if r.all_slos_met)`|`requests`|
|[**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length)|`total_error_isl`|`sum(r.input_sequence_length for r in records if not r.valid)`|`tokens`|
287
+
|[**Error Request Count**](docs/metrics_reference.md#error-request-count)|`error_request_count`|`sum(1 for r in records if not r.valid)`|`requests`|
288
+
289
+
### General Metrics
290
+
291
+
Metrics available for all benchmark runs with no special requirements.
|[**Request Count**](docs/metrics_reference.md#request-count)|`request_count`|`sum(1 for r in records if r.valid)`|`requests`|
298
+
|[**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp)|`min_request_timestamp`|`min(r.timestamp_ns for r in records)`|`datetime`|
299
+
|[**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp)|`max_response_timestamp`|`max(r.timestamp_ns + r.request_latency for r in records)`|`datetime`|
|[**HTTP Request Data Sent**](docs/metrics_reference.md#http-request-data-sent)|`http_req_data_sent`|`sum(size for _, size in request_chunks)`|`bytes`|
318
+
|[**HTTP Request Data Received**](docs/metrics_reference.md#http-request-data-received)|`http_req_data_received`|`sum(size for _, size in response_chunks)`|`bytes`|
319
+
|[**HTTP Request Connection Reused**](docs/metrics_reference.md#http-request-connection-reused)|`http_req_connection_reused`|`1 if connection_reused_perf_ns is not None else 0`|`boolean`|
320
+
321
+
</br>
322
+
323
+
188
324
## Known Issues
189
325
190
326
- Output sequence length constraints (`--output-tokens-mean`) cannot be guaranteed unless you pass `ignore_eos` and/or `min_tokens` via `--extra-inputs` to an inference server that supports them.
0 commit comments