Skip to content

test: add asynchronous benchmark script to measure inference concurrency#2185

Open
GuilhermeGors wants to merge 6 commits intoQwenLM:mainfrom
GuilhermeGors:test/ollama-concurrency
Open

test: add asynchronous benchmark script to measure inference concurrency#2185
GuilhermeGors wants to merge 6 commits intoQwenLM:mainfrom
GuilhermeGors:test/ollama-concurrency

Conversation

@GuilhermeGors
Copy link
Copy Markdown

This tool tracks concurrent request handling, temporal overlap, and serialization vs parallelism behavior, specifically targeting the Qwen 3.5 GDN architectural bottlenecks.

Relates to #2155

This tool tracks concurrent request handling, temporal overlap, and serialization vs parallelism behavior, specifically targeting the Qwen 3.5 GDN architectural bottlenecks.
Copilot AI review requested due to automatic review settings April 14, 2026 20:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a standalone async benchmark script to probe whether Ollama serves multiple simultaneous inference requests in parallel or serializes them, with reporting and JSON export to help investigate Qwen 3.5 concurrency behavior (relates to #2155).

Changes:

  • Introduces an aiohttp-based async runner that fires N simultaneous /api/generate streaming requests and captures TTFT/total time.
  • Implements a concurrency analysis heuristic (overlap/serialization verdict) plus a textual timeline visualization.
  • Exports results and derived metrics to a timestamped JSON file for offline analysis.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py
Replaced 'if value:' with 'if value is not None:' to prevent valid 0.0 metrics from being dropped.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ollama_concurrency_bench.py Outdated
Applied Copilot review to prevent Path Traversal vulnerabilities by stripping illegal directory characters from --model strings and ensuring output is strictly contained within ./bench_results using os.path.realpath validation.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
…_ttft display

Added --output-dir CLI argument (defaults to bench_results), replaced PEP 585 list[str] with typing.List[str] + from __future__ import annotations for Python 3.8 support, and fixed misleading 0.000s avg TTFT display when no data exists.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py
Comment thread ollama_concurrency_bench.py
Comment thread ollama_concurrency_bench.py Outdated
Comment thread ollama_concurrency_bench.py Outdated
Applied a complete purge of inline comments to avoid linter false alarms and clear up the iteration trail.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Author

@GuilhermeGors GuilhermeGors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All automated code review feedback has been resolved. The benchmark script has been fully refactored and is now production-ready.
Key fixes implemented:

  • Accuracy: Fixed overlap mathematical formulas and token counts (using eval_count) to ensure strict parallelism detection.
  • Safety: Added input sanitization against Path Traversal and isolated JSON outputs to a dedicated ./bench_results/ directory.
  • Stability: Added errors="replace" to prevent UTF-8 streaming crashes and handled silent JSON decode failures.
  • Compatibility: Enforced Python 3.8 support by deferring type annotations with __future__.
    The tool is safe, exact, and ready to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants