Skip to content

Commit 9d5a0c2

Browse files
authored
Merge branch 'main' into ajc/pynvml
2 parents 3e5b612 + ca4b4c7 commit 9d5a0c2

File tree

3 files changed

+1305
-7
lines changed

3 files changed

+1305
-7
lines changed

README.md

Lines changed: 138 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SPDX-License-Identifier: Apache-2.0
1212
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/aiperf)
1313

1414

15-
**[Architecture](docs/architecture.md)**| **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)**
15+
**[Architecture](docs/architecture.md)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)** | **[Metrics Reference](docs/metrics_reference.md)**
1616

1717

1818
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
@@ -115,7 +115,6 @@ aiperf profile --benchmark-duration 300.0 --benchmark-grace-period 30.0 [other o
115115

116116
</br>
117117

118-
119118
<!--
120119
======================
121120
INSTALLATION
@@ -185,6 +184,143 @@ NVIDIA AIPerf | LLM Metrics
185184
</div>
186185

187186

187+
188+
<!--
189+
======================
190+
METRICS REFERENCE
191+
======================
192+
-->
193+
194+
## Metrics Reference
195+
196+
AIPerf provides comprehensive metrics organized into multiple functional categories. For detailed descriptions, requirements, and nuances of each metric, see the **[Complete Metrics Reference](docs/metrics_reference.md)**.
197+
198+
### Streaming Metrics
199+
200+
Metrics specific to streaming requests that measure real-time token generation characteristics. Requires `--streaming` flag.
201+
202+
| Metric | Tag | Formula | Unit |
203+
|--------|-----|---------|------|
204+
| [**Time to First Token (TTFT)**](docs/metrics_reference.md#time-to-first-token-ttft) | `time_to_first_token` | `content_responses[0].perf_ns - request.start_perf_ns` | `ms` |
205+
| [**Time to Second Token (TTST)**](docs/metrics_reference.md#time-to-second-token-ttst) | `time_to_second_token` | `content_responses[1].perf_ns - content_responses[0].perf_ns` | `ms` |
206+
| [**Inter Token Latency (ITL)**](docs/metrics_reference.md#inter-token-latency-itl) | `inter_token_latency` | `(request_latency - time_to_first_token) / (output_sequence_length - 1)` | `ms` |
207+
| [**Inter Chunk Latency (ICL)**](docs/metrics_reference.md#inter-chunk-latency-icl) | `inter_chunk_latency` | `[content_responses[i].perf_ns - content_responses[i-1].perf_ns for i in range(1, len(content_responses))]` | `ms` |
208+
| [**Output Token Throughput Per User**](docs/metrics_reference.md#output-token-throughput-per-user) | `output_token_throughput_per_user` | `1.0 / inter_token_latency_seconds` | `tokens/sec/user` |
209+
| [**Time to First Output Token (TTFO)**](docs/metrics_reference.md#time-to-first-output-token-ttfo) | `time_to_first_output_token` | `first_non_reasoning_token_perf_ns - request.start_perf_ns` | `ms` |
210+
| [**Prefill Throughput Per User**](docs/metrics_reference.md#prefill-throughput-per-user) | `prefill_throughput_per_user` | `input_sequence_length / time_to_first_token_seconds` | `tokens/sec/user` |
211+
212+
### Token Based Metrics
213+
214+
Metrics for token-producing endpoints that track token counts and throughput. Requires text-generating endpoints (chat, completion, etc.).
215+
216+
| Metric | Tag | Formula | Unit |
217+
|--------|-----|---------|------|
218+
| [**Output Token Count**](docs/metrics_reference.md#output-token-count) | `output_token_count` | `len(tokenizer.encode(content, add_special_tokens=False))` | `tokens` |
219+
| [**Output Sequence Length (OSL)**](docs/metrics_reference.md#output-sequence-length-osl) | `output_sequence_length` | `(output_token_count or 0) + (reasoning_token_count or 0)` | `tokens` |
220+
| [**Input Sequence Length (ISL)**](docs/metrics_reference.md#input-sequence-length-isl) | `input_sequence_length` | `len(tokenizer.encode(prompt, add_special_tokens=False))` | `tokens` |
221+
| [**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens) | `total_output_tokens` | `sum(r.output_token_count for r in records if r.valid)` | `tokens` |
222+
| [**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length) | `total_osl` | `sum(r.output_sequence_length for r in records if r.valid)` | `tokens` |
223+
| [**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length) | `total_isl` | `sum(r.input_sequence_length for r in records if r.valid)` | `tokens` |
224+
| [**Output Token Throughput**](docs/metrics_reference.md#output-token-throughput) | `output_token_throughput` | `total_osl / benchmark_duration_seconds` | `tokens/sec` |
225+
| [**Total Token Throughput**](docs/metrics_reference.md#total-token-throughput) | `total_token_throughput` | `(total_isl + total_osl) / benchmark_duration_seconds` | `tokens/sec` |
226+
227+
### Image Metrics
228+
229+
Metrics for image processing endpoints. Requires image-capable endpoints.
230+
231+
| Metric | Tag | Formula | Unit |
232+
|--------|-----|---------|------|
233+
| [**Image Throughput**](docs/metrics_reference.md#image-throughput) | `image_throughput` | `num_images / request_latency_seconds` | `images/sec` |
234+
| [**Image Latency**](docs/metrics_reference.md#image-latency) | `image_latency` | `request_latency_ms / num_images` | `ms/image` |
235+
236+
### Reasoning Metrics
237+
238+
Metrics specific to models that support reasoning/thinking tokens. Requires models with separate `reasoning_content` field.
239+
240+
| Metric | Tag | Formula | Unit |
241+
|--------|-----|---------|------|
242+
| [**Reasoning Token Count**](docs/metrics_reference.md#reasoning-token-count) | `reasoning_token_count` | `len(tokenizer.encode(reasoning_content, add_special_tokens=False))` | `tokens` |
243+
| [**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens) | `total_reasoning_tokens` | `sum(r.reasoning_token_count for r in records if r.valid)` | `tokens` |
244+
245+
### Usage Field Metrics
246+
247+
Metrics tracking API-reported token counts from the `usage` field in responses. Useful for comparing client-side vs server-side token counts.
248+
249+
| Metric | Tag | Formula | Unit |
250+
|--------|-----|---------|------|
251+
| [**Usage Prompt Tokens**](docs/metrics_reference.md#usage-prompt-tokens) | `usage_prompt_tokens` | `response.usage.prompt_tokens` | `tokens` |
252+
| [**Usage Completion Tokens**](docs/metrics_reference.md#usage-completion-tokens) | `usage_completion_tokens` | `response.usage.completion_tokens` | `tokens` |
253+
| [**Usage Total Tokens**](docs/metrics_reference.md#usage-total-tokens) | `usage_total_tokens` | `response.usage.total_tokens` | `tokens` |
254+
| [**Usage Reasoning Tokens**](docs/metrics_reference.md#usage-reasoning-tokens) | `usage_reasoning_tokens` | `response.usage.completion_tokens_details.reasoning_tokens` | `tokens` |
255+
| [**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens) | `total_usage_prompt_tokens` | `sum(r.usage_prompt_tokens for r in records if r.valid)` | `tokens` |
256+
| [**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens) | `total_usage_completion_tokens` | `sum(r.usage_completion_tokens for r in records if r.valid)` | `tokens` |
257+
| [**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens) | `total_usage_total_tokens` | `sum(r.usage_total_tokens for r in records if r.valid)` | `tokens` |
258+
259+
### Usage Discrepancy Metrics
260+
261+
Metrics measuring differences between API-reported and client-computed token counts.
262+
263+
| Metric | Tag | Formula | Unit |
264+
|--------|-----|---------|------|
265+
| [**Usage Prompt Tokens Diff %**](docs/metrics_reference.md#usage-prompt-tokens-diff-) | `usage_prompt_tokens_diff_pct` | `abs((usage_prompt_tokens - input_sequence_length) / input_sequence_length) * 100` | `%` |
266+
| [**Usage Completion Tokens Diff %**](docs/metrics_reference.md#usage-completion-tokens-diff-) | `usage_completion_tokens_diff_pct` | `abs((usage_completion_tokens - output_sequence_length) / output_sequence_length) * 100` | `%` |
267+
| [**Usage Reasoning Tokens Diff %**](docs/metrics_reference.md#usage-reasoning-tokens-diff-) | `usage_reasoning_tokens_diff_pct` | `abs((usage_reasoning_tokens - reasoning_token_count) / reasoning_token_count) * 100` | `%` |
268+
| [**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count) | `usage_discrepancy_count` | `sum(1 for r in records if r.any_diff > threshold)` | `requests` |
269+
270+
### Goodput Metrics
271+
272+
Metrics measuring throughput of requests meeting user-defined Service Level Objectives (SLOs).
273+
274+
| Metric | Tag | Formula | Unit |
275+
|--------|-----|---------|------|
276+
| [**Good Request Count**](docs/metrics_reference.md#good-request-count) | `good_request_count` | `sum(1 for r in records if r.all_slos_met)` | `requests` |
277+
| [**Goodput**](docs/metrics_reference.md#goodput) | `goodput` | `good_request_count / benchmark_duration_seconds` | `requests/sec` |
278+
279+
### Error Metrics
280+
281+
Metrics computed for failed/error requests.
282+
283+
| Metric | Tag | Formula | Unit |
284+
|--------|-----|---------|------|
285+
| [**Error Input Sequence Length**](docs/metrics_reference.md#error-input-sequence-length) | `error_isl` | `input_sequence_length` (for error requests) | `tokens` |
286+
| [**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length) | `total_error_isl` | `sum(r.input_sequence_length for r in records if not r.valid)` | `tokens` |
287+
| [**Error Request Count**](docs/metrics_reference.md#error-request-count) | `error_request_count` | `sum(1 for r in records if not r.valid)` | `requests` |
288+
289+
### General Metrics
290+
291+
Metrics available for all benchmark runs with no special requirements.
292+
293+
| Metric | Tag | Formula | Unit |
294+
|--------|-----|---------|------|
295+
| [**Request Latency**](docs/metrics_reference.md#request-latency) | `request_latency` | `content_responses[-1].perf_ns - request.start_perf_ns` | `ms` |
296+
| [**Request Throughput**](docs/metrics_reference.md#request-throughput) | `request_throughput` | `request_count / benchmark_duration_seconds` | `requests/sec` |
297+
| [**Request Count**](docs/metrics_reference.md#request-count) | `request_count` | `sum(1 for r in records if r.valid)` | `requests` |
298+
| [**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp) | `min_request_timestamp` | `min(r.timestamp_ns for r in records)` | `datetime` |
299+
| [**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp) | `max_response_timestamp` | `max(r.timestamp_ns + r.request_latency for r in records)` | `datetime` |
300+
| [**Benchmark Duration**](docs/metrics_reference.md#benchmark-duration) | `benchmark_duration` | `max_response_timestamp - min_request_timestamp` | `sec` |
301+
302+
### HTTP Trace Metrics
303+
304+
Low-level HTTP timing metrics following k6 and HAR conventions. Requires HTTP trace data collection enabled.
305+
306+
| Metric | Tag | Formula | Unit |
307+
|--------|-----|---------|------|
308+
| [**HTTP Request Blocked**](docs/metrics_reference.md#http-request-blocked) | `http_req_blocked` | `connection_pool_wait_end_perf_ns - connection_pool_wait_start_perf_ns` | `ms` |
309+
| [**HTTP Request DNS Lookup**](docs/metrics_reference.md#http-request-dns-lookup) | `http_req_dns_lookup` | `dns_lookup_end_perf_ns - dns_lookup_start_perf_ns` | `ms` |
310+
| [**HTTP Request Connecting**](docs/metrics_reference.md#http-request-connecting) | `http_req_connecting` | `tcp_connect_end_perf_ns - tcp_connect_start_perf_ns` | `ms` |
311+
| [**HTTP Request Sending**](docs/metrics_reference.md#http-request-sending) | `http_req_sending` | `request_send_end_perf_ns - request_send_start_perf_ns` | `ms` |
312+
| [**HTTP Request Waiting**](docs/metrics_reference.md#http-request-waiting) | `http_req_waiting` | `response_chunks[0][0] - request_send_end_perf_ns` | `ms` |
313+
| [**HTTP Request Receiving**](docs/metrics_reference.md#http-request-receiving) | `http_req_receiving` | `response_chunks[-1][0] - response_chunks[0][0]` | `ms` |
314+
| [**HTTP Request Duration**](docs/metrics_reference.md#http-request-duration) | `http_req_duration` | `response_receive_end_perf_ns - request_send_start_perf_ns` | `ms` |
315+
| [**HTTP Request Connection Overhead**](docs/metrics_reference.md#http-request-connection-overhead) | `http_req_connection_overhead` | `http_req_blocked + http_req_dns_lookup + http_req_connecting` | `ms` |
316+
| [**HTTP Request Total**](docs/metrics_reference.md#http-request-total) | `http_req_total` | `http_req_blocked + http_req_dns_lookup + http_req_connecting + http_req_sending + http_req_waiting + http_req_receiving` | `ms` |
317+
| [**HTTP Request Data Sent**](docs/metrics_reference.md#http-request-data-sent) | `http_req_data_sent` | `sum(size for _, size in request_chunks)` | `bytes` |
318+
| [**HTTP Request Data Received**](docs/metrics_reference.md#http-request-data-received) | `http_req_data_received` | `sum(size for _, size in response_chunks)` | `bytes` |
319+
| [**HTTP Request Connection Reused**](docs/metrics_reference.md#http-request-connection-reused) | `http_req_connection_reused` | `1 if connection_reused_perf_ns is not None else 0` | `boolean` |
320+
321+
</br>
322+
323+
188324
## Known Issues
189325

190326
- Output sequence length constraints (`--output-tokens-mean`) cannot be guaranteed unless you pass `ignore_eos` and/or `min_tokens` via `--extra-inputs` to an inference server that supports them.

0 commit comments

Comments
 (0)