Skip to content

Issues with inf generation throughput and 0 ttft #27

@Prgrmman

Description

@Prgrmman

Summary

I've been working on a project involving lmcache tuning, and we have been using this benchmark as part of our evaluation suite.
The metric we are focusing on is "time to first token" (TTFT), and we have been noticing that many of the benchmarks are returning a time of 0 for this metric, which doesn't seem correct.

Additionally, we frequently see average throughput at inf

Is this working as designed? Could some guidance be offered how to better interpret these results?

Details

Command invoked was ./run_benchmarks.sh "meta-llama/Llama-3.1-70B-Instruct" http://khanhtest-vllm-fs-lmcache.ibm-cas-red-stack.svc.cluster.local:8000 /tmp/benchmark/lmcache_off all 1.34 2.0 3.0

/tmp/benchmark/lmcache_off_long_input_output_1.34.csv
[2025-06-16 11:44:01,006] WARNING: Processing the existing summary file /tmp/benchmark/lmcache_off_long_input_output_1.34.csv, ignoring all the other arguments (multi-round-qa.py:722:__main__)
[2025-06-16 11:44:01,008] INFO: Calculating performance summary (multi-round-qa.py:568:__main__)


==================== Performance summary ======================
  QPS: 0.0000 reqs/s

  Processing speed: 1.3941 reqs/s

  Requests on-the-fly: 0

  Input tokens per second: 29451.9382 tokens/s

  Output tokens per second: 1.3941 tokens/s

  Average generation throughput (per request): inf tokens/req/s
                                               ^^^^^^^^ inf??

  Average TTFT: 0.0000s
                ^^^^^^^^^~ This takes no time at all?

Time range: 1750097945.6405346 - 1750098046.7831142 (101.14s)
===============================================================

lmcache_off_long_input_output_1.34.csv

Branch commit was 95b2939136ff003e2e4c67277ec82bf43ff8be34 on main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions