Skip to content

--rate-type concurrent CLI parameter is implemented #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

parfeniukink
Copy link
Contributor

@parfeniukink parfeniukink commented Feb 27, 2025

Execution example

(py38) ➜  guidellm git:(parfeniukink/concurrent-load-generation-v2) python src/guidellm/main.py --target http://localhost:8080/v1 --model Phi-3-mini-4k-instruct-q4.gguf --data 'prompt_tokens=128,generated_tokens=128' --data-type emulated --tokenizer "hf-internal-testing/llama-tokenizer" --max-requests 2 --rate-type concurrent --rate 2
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [14:08:32]   100% concurrent   (0.12 req/sec avg)                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  Generating report... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:00:16 < 0:00:00 ]
╭─ GuideLLM Benchmarks Report (stdout) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─ Benchmark Report 1 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │
│ │ Backend(type=openai_server, target=http://localhost:8080/v1, model=Phi-3-mini-4k-instruct-q4.gguf)                                                                                   │ │
│ │ Data(type=emulated, source=prompt_tokens=128,generated_tokens=128, tokenizer=hf-internal-testing/llama-tokenizer)                                                                    │ │
│ │ Rate(type=concurrent, rate=(2.0,))                                                                                                                                                   │ │
│ │ Limits(max_number=2 requests, max_duration=120 sec)                                                                                                                                  │ │
│ │                                                                                                                                                                                      │ │
│ │                                                                                                                                                                                      │ │
│ │ Requests Data by Benchmark                                                                                                                                                           │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓                                                                              │ │
│ │ ┃ Benchmark                 ┃ Requests Completed ┃ Request Failed ┃ Duration  ┃ Start Time ┃ End Time ┃                                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩                                                                              │ │
│ │ │ [email protected] req/sec │ 4/4                │ 0/4            │ 32.05 sec │ 14:08:32   │ 14:09:04 │                                                                              │ │
│ │ └───────────────────────────┴────────────────────┴────────────────┴───────────┴────────────┴──────────┘                                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Tokens Data by Benchmark                                                                                                                                                             │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                              │ │
│ │ ┃ Benchmark                 ┃ Prompt ┃ Prompt (1%, 5%, 50%, 95%, 99%)    ┃ Output ┃ Output (1%, 5%, 50%, 95%, 99%)    ┃                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                                              │ │
│ │ │ [email protected] req/sec │ 129.00 │ 129.0, 129.0, 129.0, 129.0, 129.0 │ 128.00 │ 128.0, 128.0, 128.0, 128.0, 128.0 │                                                              │ │
│ │ └───────────────────────────┴────────┴───────────────────────────────────┴────────┴───────────────────────────────────┘                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Stats by Benchmark                                                                                                                                                       │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │
│ │ ┃                           ┃ Request Latency [1%, 5%, 10%, 50%, 90%, 95%,    ┃ Time to First Token [1%, 5%, 10%, 50%, 90%,     ┃ Inter Token Latency [1%, 5%, 10%, 50%, 90% 95%,  ┃ │ │
│ │ ┃ Benchmark                 ┃ 99%] (sec)                                      ┃ 95%, 99%] (ms)                                  ┃ 99%] (ms)                                        ┃ │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ │
│ │ │ [email protected] req/sec │ 8.50, 9.46, 10.65, 20.20, 29.68, 30.85, 31.80   │ 1365.0, 2316.8, 3506.6, 13039.3, 22588.4,       │ 50.8, 51.4, 51.7, 54.7, 62.3, 68.1, 73.6         │ │ │
│ │ │                           │                                                 │ 23781.6, 24736.2                                │                                                  │ │ │
│ │ └───────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Summary by Benchmark                                                                                                                                                     │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓                                          │ │
│ │ ┃ Benchmark                 ┃ Requests per Second ┃ Request Latency ┃ Time to First Token ┃ Inter Token Latency ┃ Output Token Throughput ┃                                          │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩                                          │ │
│ │ │ [email protected] req/sec │ 0.12 req/sec        │ 20.17 sec       │ 13045.15 ms         │ 56.12 ms            │ 15.98 tokens/sec        │                                          │ │
│ │ └───────────────────────────┴─────────────────────┴─────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┘                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

tox local report

for some reason GitHub has some issues with quality checks. Here is report from local:

Alacritty 2025-02-27

use ``--rate`` CLI parameter to specify concurrent workers number
@markurtz
Copy link
Member

Closing this out as it is being reworked and included in #96

@markurtz markurtz closed this Mar 10, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in GuideLLM Kanban Board Mar 10, 2025
markurtz added a commit that referenced this pull request Apr 11, 2025
…ation Refactor (#96)

Full refactor of GuideLLM enabling better overall performance to ensure
minimal overhead for benchmarking with a new multiprocess and threaded
scheduler along with significant updates to the output formats enabling
better analysis, visibility, and clarity.

<img width="668" alt="Screenshot 2025-04-11 at 2 26 13 PM"
src="https://github.com/user-attachments/assets/a723854a-7fe0-4eb2-9408-f632e747c3c2"
/>

Fixes:
- #92 
- #77 
- #47 
- #79

---------

Co-authored-by: Alexandre Marques <[email protected]>
Co-authored-by: Samuel Monson <[email protected]>
Co-authored-by: David Gray <[email protected]>
@markurtz markurtz deleted the parfeniukink/concurrent-load-generation-v2 branch April 21, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
load-request load-request workstream
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants