test: add asynchronous benchmark script to measure inference concurrency by GuilhermeGors · Pull Request #2185 · QwenLM/Qwen

GuilhermeGors · 2026-04-14T20:42:57Z

This tool tracks concurrent request handling, temporal overlap, and serialization vs parallelism behavior, specifically targeting the Qwen 3.5 GDN architectural bottlenecks.

Relates to #2155

This tool tracks concurrent request handling, temporal overlap, and serialization vs parallelism behavior, specifically targeting the Qwen 3.5 GDN architectural bottlenecks.

Copilot

Pull request overview

Adds a standalone async benchmark script to probe whether Ollama serves multiple simultaneous inference requests in parallel or serializes them, with reporting and JSON export to help investigate Qwen 3.5 concurrency behavior (relates to #2155).

Changes:

Introduces an aiohttp-based async runner that fires N simultaneous /api/generate streaming requests and captures TTFT/total time.
Implements a concurrency analysis heuristic (overlap/serialization verdict) plus a textual timeline visualization.
Exports results and derived metrics to a timestamped JSON file for offline analysis.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ormalization, and safe imports

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replaced 'if value:' with 'if value is not None:' to prevent valid 0.0 metrics from being dropped.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Applied Copilot review to prevent Path Traversal vulnerabilities by stripping illegal directory characters from --model strings and ensuring output is strictly contained within ./bench_results using os.path.realpath validation.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…_ttft display Added --output-dir CLI argument (defaults to bench_results), replaced PEP 585 list[str] with typing.List[str] + from __future__ import annotations for Python 3.8 support, and fixed misleading 0.000s avg TTFT display when no data exists.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Applied a complete purge of inline comments to avoid linter false alarms and clear up the iteration trail.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GuilhermeGors

All automated code review feedback has been resolved. The benchmark script has been fully refactored and is now production-ready.
Key fixes implemented:

Accuracy: Fixed overlap mathematical formulas and token counts (using eval_count) to ensure strict parallelism detection.
Safety: Added input sanitization against Path Traversal and isolated JSON outputs to a dedicated ./bench_results/ directory.
Stability: Added errors="replace" to prevent UTF-8 streaming crashes and handled silent JSON decode failures.
Compatibility: Enforced Python 3.8 support by deferring type annotations with __future__.
The tool is safe, exact, and ready to be merged

test: add asynchronous benchmark script to measure inference concurrency

bc66238

This tool tracks concurrent request handling, temporal overlap, and serialization vs parallelism behavior, specifically targeting the Qwen 3.5 GDN architectural bottlenecks.

Copilot AI review requested due to automatic review settings April 14, 2026 20:42

Copilot started reviewing on behalf of GuilhermeGors April 14, 2026 20:43 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

fix: address copilot review comments regarding token counting, math n…

9baa8bd

…ormalization, and safe imports

GuilhermeGors mentioned this pull request Apr 14, 2026

[BUG] Poor single-server concurrency behavior in Qwen 3.5 under Ollama despite strong token speed #2155

Open

2 tasks

GuilhermeGors requested a review from Copilot April 15, 2026 17:01

Copilot started reviewing on behalf of GuilhermeGors April 15, 2026 17:02 View session