Feature/rename max concurrent to max concurrent trials #122

aas008 · 2025-11-12T23:25:10Z

Summary by CodeRabbit

Refactor
- CLI flag renamed from --max-concurrent to --max-concurrent-trials
- Configuration key renamed from max_concurrent to max_concurrent_trials
- Optimization startup trials default remains 10
- Progress and error messages updated to reference the new flag/key
Documentation
- Quick start and CLI guidance updated to use the new flag and config key

coderabbitai · 2025-11-12T23:25:19Z

Walkthrough

Rename of concurrency identifier from max_concurrent to max_concurrent_trials across CLI, config, and controller layers; function signatures, CLI flags, config keys, internal variables, and messages updated. No control-flow or behavioral changes aside from naming and message text.

Changes

Cohort / File(s)	Summary
CLI & commands `auto_tune_vllm/cli/main.py`	CLI flag renamed from `--max-concurrent` to `--max-concurrent-trials`; function parameters and local variables renamed to `max_concurrent_trials` in `optimize_command`, `resume_command`, `run_optimization_sync`, and `resume_study_sync`; updated validation/error messages and config guidance strings; all controller calls pass `max_concurrent_trials`.
Core configuration `auto_tune_vllm/core/config.py`	`OptimizationConfig` field renamed from `max_concurrent` to `max_concurrent_trials` (`Optional[int]`, default `None`). `n_startup_trials` remains `int = 10` (no behavioral change).
Study controller / optimization `auto_tune_vllm/core/study_controller.py`	Method signatures and internal parameter names changed from `max_concurrent` to `max_concurrent_trials` (`run_optimization`, `_submit_available_trials` and related calls); validation and user-facing messages updated to reference `--max-concurrent-trials` and `optimization.max_concurrent_trials`; all internal invocations updated to pass the renamed parameter.
Documentation `docs/quick_start.md`	Updated CLI and YAML examples to use `--max-concurrent-trials` and `max_concurrent_trials` config key.

Sequence Diagram(s)

(omitted — changes are naming-only; control flow unchanged)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Special attention:
- Verify there are no remaining max_concurrent occurrences (CLI flags, config keys, docs, logging).
- Confirm public function signatures and dataclass field rename are backward-compatible or documented for users.
- Validate updated help/error messages and example YAML reflect the new key exactly.

Poem

🐇 A tiny hop, a tidy swap,
max_concurrent_trials hops to the top,
Flags updated, configs aligned,
Old names tucked away, neatly confined,
🥕🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: renaming the max_concurrent parameter/option to max_concurrent_trials across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/rename-max-concurrent-to-max-concurrent-trials

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b01b775 and 3ff042a.

📒 Files selected for processing (1)

auto_tune_vllm/core/study_controller.py (4 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-20T20:40:13.235Z

Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

auto_tune_vllm/core/study_controller.py

🔇 Additional comments (3)

auto_tune_vllm/core/study_controller.py (3)

460-476: LGTM! Parameter rename is clean and well-documented.

The signature change from max_concurrent to max_concurrent_trials is clear and the docstring accurately reflects the new parameter name.

510-515: LGTM! Argument passing updated correctly.

The call to _submit_available_trials now passes max_concurrent_trials, matching the updated method signature.

616-623: LGTM! Method signature properly formatted and rename is consistent.

The method signature for _submit_available_trials has been updated with the new max_concurrent_trials parameter name, and it's properly split across multiple lines to comply with line length limits. The usage in the loop condition is also correct.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

thameem-abbas · 2025-11-12T23:27:52Z

I think another PR is also in the changes of this PR

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/quick_start.md (1)

148-162: Update Quick Start commands to the renamed flag.

These examples still show --max-concurrent, but the CLI now exposes --max-concurrent-trials. Following the doc as written will throw “no such option” errors. Please replace every remaining occurrence here with the new flag (and the YAML key that pairs with it) so the instructions remain accurate.

auto_tune_vllm/core/study_controller.py (1)

481-615: Fix the Ruff E501 failures.

CI is red because these lines exceed 88 characters. Splitting the long strings and signature clears the lint errors.

-        msg = (
-            "❌ --max-concurrent-trials is required to prevent GPU memory conflicts!\n\n"
-            "Add to your YAML config:\n"
-            "  optimization:\n"
-            "    max_concurrent_trials: 2  # Match your GPU count\n\n"
-            "Or use CLI: --max-concurrent-trials 2"
-        )
+        msg = (
+            "❌ --max-concurrent-trials is required to prevent GPU memory "
+            "conflicts!\n\n"
+            "Add to your YAML config:\n"
+            "  optimization:\n"
+            "    max_concurrent_trials: 2  # Match your GPU count\n\n"
+            "Or use CLI: --max-concurrent-trials 2"
+        )
-        max_concurrent_str = (
-            max_concurrent_trials if max_concurrent_trials != float("inf") else "unlimited"
-        )
+        if max_concurrent_trials == float("inf"):
+            max_concurrent_str = "unlimited"
+        else:
+            max_concurrent_str = max_concurrent_trials
...
-    def _submit_available_trials(self, remaining_trials: int, max_concurrent_trials: float):
+    def _submit_available_trials(
+        self,
+        remaining_trials: int,
+        max_concurrent_trials: float,
+    ):

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6d0eab and 285f7aa.

📒 Files selected for processing (4)

auto_tune_vllm/cli/main.py (9 hunks)
auto_tune_vllm/core/config.py (1 hunks)
auto_tune_vllm/core/study_controller.py (5 hunks)
docs/quick_start.md (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-20T20:40:13.235Z

Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

auto_tune_vllm/core/config.py
auto_tune_vllm/core/study_controller.py

🧬 Code graph analysis (1)

auto_tune_vllm/cli/main.py (1)

auto_tune_vllm/core/study_controller.py (1)

run_optimization (460-611)

🪛 GitHub Actions: lint-ci

auto_tune_vllm/core/study_controller.py

[error] 482-482: E501 Line too long (93 > 88)

[error] 482-482: Ruff check failed: Line exceeds maximum line length. Please wrap or refactor.

🪛 GitHub Check: ruff

auto_tune_vllm/core/study_controller.py

[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)

[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)

[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)

coderabbitai · 2025-11-12T23:29:58Z

auto_tune_vllm/core/study_controller.py

+        # Validate n_startup_trials < n_trials to ensure the sampler algorithm runs
+        # This applies to all samplers with startup trials (TPE, GP, BoTorch)
+        if n_startup_trials >= n_trials:
+            suggestion = max(1, n_trials // 10)
+            min_trials = n_startup_trials + 1
+            msg = (
+                f"n_startup_trials ({n_startup_trials}) must be less than "
+                f"n_trials ({n_trials}). Otherwise all trials would be random. "
+                f"Suggestion: Set n_startup_trials to {suggestion} "
+                f"or increase n_trials to at least {min_trials}."
+            )
+            raise ValueError(msg)
+
+        # Log sampler configuration
+        logger.info(
+            f"Creating {sampler_name.upper()} sampler "
+            f"(n_startup_trials={n_startup_trials}, n_trials={n_trials})"
+        )

        if sampler_name == "tpe":
-            return TPESampler()
+            # TPESampler uses random sampling for first n_startup_trials
+            return TPESampler(n_startup_trials=n_startup_trials)
        elif sampler_name == "random":
+            # RandomSampler is always random, no startup trials concept
            return RandomSampler()
        elif sampler_name == "gp":
-            return GPSampler()
+            # GPSampler uses random sampling for initial trials
+            return GPSampler(n_startup_trials=n_startup_trials)
        elif sampler_name == "botorch":
-            return optuna.integration.BoTorchSampler()
+            # BoTorchSampler uses random sampling for initial trials
+            return optuna.integration.BoTorchSampler(n_startup_trials=n_startup_trials)
        elif sampler_name == "nsga2":
+            # NSGA2 is a genetic algorithm, no startup trials concept
            return NSGAIISampler()
        elif sampler_name == "grid":
-            # Build search space for grid sampler
+            # GridSampler is deterministic, no startup trials concept
            search_space = StudyController._create_search_space(config)
            grid_size = StudyController._calculate_grid_size(search_space)


⚠️ Potential issue | 🟠 Major

Limit startup-trial validation to the samplers that use it.

We now raise whenever n_startup_trials >= n_trials, but that fires before we know whether the sampler even consumes startup trials. With the new default of 10, any config that sets n_trials ≤ 10 now fails—for example a quick random or grid search with five trials, which previously worked because those samplers ignore n_startup_trials. That’s a regression. Please gate the check to samplers that actually need it (TPE, GP, BoTorch) or otherwise adjust the default before validating.

- # Validate n_startup_trials < n_trials to ensure the sampler algorithm runs - # This applies to all samplers with startup trials (TPE, GP, BoTorch) - if n_startup_trials >= n_trials: + # Validate only for samplers that honor startup trials (TPE, GP, BoTorch) + samplers_requiring_startup = {"tpe", "gp", "botorch"} + if sampler_name in samplers_requiring_startup and n_startup_trials >= n_trials: suggestion = max(1, n_trials // 10) min_trials = n_startup_trials + 1 msg = ( f"n_startup_trials ({n_startup_trials}) must be less than " f"n_trials ({n_trials}). Otherwise all trials would be random. " f"Suggestion: Set n_startup_trials to {suggestion} " f"or increase n_trials to at least {min_trials}." ) raise ValueError(msg)

🤖 Prompt for AI Agents

auto_tune_vllm/core/study_controller.py lines 401-438: The current code validates n_startup_trials >= n_trials for all samplers, causing valid samplers that ignore startup trials (e.g., random, grid, nsga2) to fail; move or gate this validation so it only runs for samplers that consume startup trials (tpe, gp, botorch): check sampler_name first (or perform the check inside the corresponding branches) and raise the ValueError only for those samplers, leaving random/grid/nsga2/grid unaffected; keep the sampler_info log and creation logic otherwise unchanged.

thameem-abbas

LGTM once the quick start instruction is updated as code rabbit suggests

thameem-abbas

This needs to be rebased on main after merging the PR for startup trials

Rename the max_concurrent parameter to max_concurrent_trials to clarify that it controls the number of Optuna optimization trials running in parallel, not GuideLLM's internal concurrency settings.

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 285f7aa and b01b775.

📒 Files selected for processing (4)

auto_tune_vllm/cli/main.py (9 hunks)
auto_tune_vllm/core/config.py (1 hunks)
auto_tune_vllm/core/study_controller.py (4 hunks)
docs/quick_start.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

docs/quick_start.md
auto_tune_vllm/cli/main.py
auto_tune_vllm/core/config.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-20T20:40:13.235Z

Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

auto_tune_vllm/core/study_controller.py

🪛 GitHub Actions: lint-ci

auto_tune_vllm/core/study_controller.py

[error] 482-482: Ruff: E501 Line too long (93 > 88).

🪛 GitHub Check: ruff

auto_tune_vllm/core/study_controller.py

[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)

[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)

[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)

auto_tune_vllm/core/study_controller.py

aas008 · 2025-11-13T15:38:18Z

I think another PR is also in the changes of this PR

@thameem-abbas Yes, it was. I have done a rebase and some ruff checks. Should be okay now?

thameem-abbas

LGTM. Thanks

aas008 self-assigned this Nov 12, 2025

aas008 requested review from ephoris and thameem-abbas November 12, 2025 23:25

aas008 linked an issue Nov 12, 2025 that may be closed by this pull request

Rename Max_concurrent to max_concurrent_trials / max_parallel_trials #119

Closed

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

thameem-abbas approved these changes Nov 12, 2025

View reviewed changes

thameem-abbas requested changes Nov 12, 2025

View reviewed changes

feat: rename max_concurrent to max_concurrent_trials

b01b775

Rename the max_concurrent parameter to max_concurrent_trials to clarify that it controls the number of Optuna optimization trials running in parallel, not GuideLLM's internal concurrency settings.

aas008 force-pushed the feature/rename-max-concurrent-to-max-concurrent-trials branch from 285f7aa to b01b775 Compare November 13, 2025 15:30

style: fix ruff line length errors

3ff042a

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

auto_tune_vllm/core/study_controller.py Outdated Show resolved Hide resolved

auto_tune_vllm/core/study_controller.py Outdated Show resolved Hide resolved

auto_tune_vllm/core/study_controller.py Outdated Show resolved Hide resolved

aas008 requested a review from thameem-abbas November 13, 2025 15:41

thameem-abbas approved these changes Nov 19, 2025

View reviewed changes

aas008 merged commit 3b3c80d into main Nov 19, 2025
2 of 3 checks passed

aas008 deleted the feature/rename-max-concurrent-to-max-concurrent-trials branch November 19, 2025 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/rename max concurrent to max concurrent trials #122

Feature/rename max concurrent to max concurrent trials #122

Uh oh!

aas008 commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

thameem-abbas commented Nov 12, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

thameem-abbas left a comment

Uh oh!

thameem-abbas left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aas008 commented Nov 13, 2025

Uh oh!

thameem-abbas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/rename max concurrent to max concurrent trials #122

Feature/rename max concurrent to max concurrent trials #122

Uh oh!

Conversation

aas008 commented Nov 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

thameem-abbas commented Nov 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

thameem-abbas left a comment

Choose a reason for hiding this comment

Uh oh!

thameem-abbas left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aas008 commented Nov 13, 2025

Uh oh!

thameem-abbas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aas008 commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading