Skip to content

Conversation

@aas008
Copy link
Collaborator

@aas008 aas008 commented Nov 12, 2025

Summary by CodeRabbit

  • Refactor

    • CLI flag renamed from --max-concurrent to --max-concurrent-trials
    • Configuration key renamed from max_concurrent to max_concurrent_trials
    • Optimization startup trials default remains 10
    • Progress and error messages updated to reference the new flag/key
  • Documentation

    • Quick start and CLI guidance updated to use the new flag and config key

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 12, 2025

Walkthrough

Rename of concurrency identifier from max_concurrent to max_concurrent_trials across CLI, config, and controller layers; function signatures, CLI flags, config keys, internal variables, and messages updated. No control-flow or behavioral changes aside from naming and message text.

Changes

Cohort / File(s) Summary
CLI & commands
auto_tune_vllm/cli/main.py
CLI flag renamed from --max-concurrent to --max-concurrent-trials; function parameters and local variables renamed to max_concurrent_trials in optimize_command, resume_command, run_optimization_sync, and resume_study_sync; updated validation/error messages and config guidance strings; all controller calls pass max_concurrent_trials.
Core configuration
auto_tune_vllm/core/config.py
OptimizationConfig field renamed from max_concurrent to max_concurrent_trials (Optional[int], default None). n_startup_trials remains int = 10 (no behavioral change).
Study controller / optimization
auto_tune_vllm/core/study_controller.py
Method signatures and internal parameter names changed from max_concurrent to max_concurrent_trials (run_optimization, _submit_available_trials and related calls); validation and user-facing messages updated to reference --max-concurrent-trials and optimization.max_concurrent_trials; all internal invocations updated to pass the renamed parameter.
Documentation
docs/quick_start.md
Updated CLI and YAML examples to use --max-concurrent-trials and max_concurrent_trials config key.

Sequence Diagram(s)

(omitted — changes are naming-only; control flow unchanged)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Special attention:
    • Verify there are no remaining max_concurrent occurrences (CLI flags, config keys, docs, logging).
    • Confirm public function signatures and dataclass field rename are backward-compatible or documented for users.
    • Validate updated help/error messages and example YAML reflect the new key exactly.

Poem

🐇 A tiny hop, a tidy swap,
max_concurrent_trials hops to the top,
Flags updated, configs aligned,
Old names tucked away, neatly confined,
🥕🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: renaming the max_concurrent parameter/option to max_concurrent_trials across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/rename-max-concurrent-to-max-concurrent-trials

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b01b775 and 3ff042a.

📒 Files selected for processing (1)
  • auto_tune_vllm/core/study_controller.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T20:40:13.235Z
Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

  • auto_tune_vllm/core/study_controller.py
🔇 Additional comments (3)
auto_tune_vllm/core/study_controller.py (3)

460-476: LGTM! Parameter rename is clean and well-documented.

The signature change from max_concurrent to max_concurrent_trials is clear and the docstring accurately reflects the new parameter name.


510-515: LGTM! Argument passing updated correctly.

The call to _submit_available_trials now passes max_concurrent_trials, matching the updated method signature.


616-623: LGTM! Method signature properly formatted and rename is consistent.

The method signature for _submit_available_trials has been updated with the new max_concurrent_trials parameter name, and it's properly split across multiple lines to comply with line length limits. The usage in the loop condition is also correct.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aas008 aas008 self-assigned this Nov 12, 2025
@aas008 aas008 linked an issue Nov 12, 2025 that may be closed by this pull request
@thameem-abbas
Copy link
Member

I think another PR is also in the changes of this PR

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/quick_start.md (1)

148-162: Update Quick Start commands to the renamed flag.

These examples still show --max-concurrent, but the CLI now exposes --max-concurrent-trials. Following the doc as written will throw “no such option” errors. Please replace every remaining occurrence here with the new flag (and the YAML key that pairs with it) so the instructions remain accurate.

auto_tune_vllm/core/study_controller.py (1)

481-615: Fix the Ruff E501 failures.

CI is red because these lines exceed 88 characters. Splitting the long strings and signature clears the lint errors.

-        msg = (
-            "❌ --max-concurrent-trials is required to prevent GPU memory conflicts!\n\n"
-            "Add to your YAML config:\n"
-            "  optimization:\n"
-            "    max_concurrent_trials: 2  # Match your GPU count\n\n"
-            "Or use CLI: --max-concurrent-trials 2"
-        )
+        msg = (
+            "❌ --max-concurrent-trials is required to prevent GPU memory "
+            "conflicts!\n\n"
+            "Add to your YAML config:\n"
+            "  optimization:\n"
+            "    max_concurrent_trials: 2  # Match your GPU count\n\n"
+            "Or use CLI: --max-concurrent-trials 2"
+        )
-        max_concurrent_str = (
-            max_concurrent_trials if max_concurrent_trials != float("inf") else "unlimited"
-        )
+        if max_concurrent_trials == float("inf"):
+            max_concurrent_str = "unlimited"
+        else:
+            max_concurrent_str = max_concurrent_trials
...
-    def _submit_available_trials(self, remaining_trials: int, max_concurrent_trials: float):
+    def _submit_available_trials(
+        self,
+        remaining_trials: int,
+        max_concurrent_trials: float,
+    ):
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6d0eab and 285f7aa.

📒 Files selected for processing (4)
  • auto_tune_vllm/cli/main.py (9 hunks)
  • auto_tune_vllm/core/config.py (1 hunks)
  • auto_tune_vllm/core/study_controller.py (5 hunks)
  • docs/quick_start.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T20:40:13.235Z
Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

  • auto_tune_vllm/core/config.py
  • auto_tune_vllm/core/study_controller.py
🧬 Code graph analysis (1)
auto_tune_vllm/cli/main.py (1)
auto_tune_vllm/core/study_controller.py (1)
  • run_optimization (460-611)
🪛 GitHub Actions: lint-ci
auto_tune_vllm/core/study_controller.py

[error] 482-482: E501 Line too long (93 > 88)


[error] 482-482: Ruff check failed: Line exceeds maximum line length. Please wrap or refactor.

🪛 GitHub Check: ruff
auto_tune_vllm/core/study_controller.py

[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)


[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)


[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)

Comment on lines 401 to 438
# Validate n_startup_trials < n_trials to ensure the sampler algorithm runs
# This applies to all samplers with startup trials (TPE, GP, BoTorch)
if n_startup_trials >= n_trials:
suggestion = max(1, n_trials // 10)
min_trials = n_startup_trials + 1
msg = (
f"n_startup_trials ({n_startup_trials}) must be less than "
f"n_trials ({n_trials}). Otherwise all trials would be random. "
f"Suggestion: Set n_startup_trials to {suggestion} "
f"or increase n_trials to at least {min_trials}."
)
raise ValueError(msg)

# Log sampler configuration
logger.info(
f"Creating {sampler_name.upper()} sampler "
f"(n_startup_trials={n_startup_trials}, n_trials={n_trials})"
)

if sampler_name == "tpe":
return TPESampler()
# TPESampler uses random sampling for first n_startup_trials
return TPESampler(n_startup_trials=n_startup_trials)
elif sampler_name == "random":
# RandomSampler is always random, no startup trials concept
return RandomSampler()
elif sampler_name == "gp":
return GPSampler()
# GPSampler uses random sampling for initial trials
return GPSampler(n_startup_trials=n_startup_trials)
elif sampler_name == "botorch":
return optuna.integration.BoTorchSampler()
# BoTorchSampler uses random sampling for initial trials
return optuna.integration.BoTorchSampler(n_startup_trials=n_startup_trials)
elif sampler_name == "nsga2":
# NSGA2 is a genetic algorithm, no startup trials concept
return NSGAIISampler()
elif sampler_name == "grid":
# Build search space for grid sampler
# GridSampler is deterministic, no startup trials concept
search_space = StudyController._create_search_space(config)
grid_size = StudyController._calculate_grid_size(search_space)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Limit startup-trial validation to the samplers that use it.

We now raise whenever n_startup_trials >= n_trials, but that fires before we know whether the sampler even consumes startup trials. With the new default of 10, any config that sets n_trials ≤ 10 now fails—for example a quick random or grid search with five trials, which previously worked because those samplers ignore n_startup_trials. That’s a regression. Please gate the check to samplers that actually need it (TPE, GP, BoTorch) or otherwise adjust the default before validating.

-        # Validate n_startup_trials < n_trials to ensure the sampler algorithm runs
-        # This applies to all samplers with startup trials (TPE, GP, BoTorch)
-        if n_startup_trials >= n_trials:
+        # Validate only for samplers that honor startup trials (TPE, GP, BoTorch)
+        samplers_requiring_startup = {"tpe", "gp", "botorch"}
+        if sampler_name in samplers_requiring_startup and n_startup_trials >= n_trials:
             suggestion = max(1, n_trials // 10)
             min_trials = n_startup_trials + 1
             msg = (
                 f"n_startup_trials ({n_startup_trials}) must be less than "
                 f"n_trials ({n_trials}). Otherwise all trials would be random. "
                 f"Suggestion: Set n_startup_trials to {suggestion} "
                 f"or increase n_trials to at least {min_trials}."
             )
             raise ValueError(msg)
🤖 Prompt for AI Agents
auto_tune_vllm/core/study_controller.py lines 401-438: The current code
validates n_startup_trials >= n_trials for all samplers, causing valid samplers
that ignore startup trials (e.g., random, grid, nsga2) to fail; move or gate
this validation so it only runs for samplers that consume startup trials (tpe,
gp, botorch): check sampler_name first (or perform the check inside the
corresponding branches) and raise the ValueError only for those samplers,
leaving random/grid/nsga2/grid unaffected; keep the sampler_info log and
creation logic otherwise unchanged.

Copy link
Member

@thameem-abbas thameem-abbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the quick start instruction is updated as code rabbit suggests

Copy link
Member

@thameem-abbas thameem-abbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rebased on main after merging the PR for startup trials

Rename the max_concurrent parameter to max_concurrent_trials to clarify
that it controls the number of Optuna optimization trials running in
parallel, not GuideLLM's internal concurrency settings.
@aas008 aas008 force-pushed the feature/rename-max-concurrent-to-max-concurrent-trials branch from 285f7aa to b01b775 Compare November 13, 2025 15:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 285f7aa and b01b775.

📒 Files selected for processing (4)
  • auto_tune_vllm/cli/main.py (9 hunks)
  • auto_tune_vllm/core/config.py (1 hunks)
  • auto_tune_vllm/core/study_controller.py (4 hunks)
  • docs/quick_start.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/quick_start.md
  • auto_tune_vllm/cli/main.py
  • auto_tune_vllm/core/config.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T20:40:13.235Z
Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.

Applied to files:

  • auto_tune_vllm/core/study_controller.py
🪛 GitHub Actions: lint-ci
auto_tune_vllm/core/study_controller.py

[error] 482-482: Ruff: E501 Line too long (93 > 88).

🪛 GitHub Check: ruff
auto_tune_vllm/core/study_controller.py

[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)


[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)


[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)

@aas008
Copy link
Collaborator Author

aas008 commented Nov 13, 2025

I think another PR is also in the changes of this PR

@thameem-abbas Yes, it was. I have done a rebase and some ruff checks. Should be okay now?

@aas008 aas008 requested a review from thameem-abbas November 13, 2025 15:41
Copy link
Member

@thameem-abbas thameem-abbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@aas008 aas008 merged commit 3b3c80d into main Nov 19, 2025
2 of 3 checks passed
@aas008 aas008 deleted the feature/rename-max-concurrent-to-max-concurrent-trials branch November 19, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename Max_concurrent to max_concurrent_trials / max_parallel_trials

3 participants