-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/rename max concurrent to max concurrent trials #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/rename max concurrent to max concurrent trials #122
Conversation
WalkthroughRename of concurrency identifier from Changes
Sequence Diagram(s)(omitted — changes are naming-only; control flow unchanged) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-10-20T20:40:13.235ZApplied to files:
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I think another PR is also in the changes of this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/quick_start.md (1)
148-162: Update Quick Start commands to the renamed flag.These examples still show
--max-concurrent, but the CLI now exposes--max-concurrent-trials. Following the doc as written will throw “no such option” errors. Please replace every remaining occurrence here with the new flag (and the YAML key that pairs with it) so the instructions remain accurate.auto_tune_vllm/core/study_controller.py (1)
481-615: Fix the Ruff E501 failures.CI is red because these lines exceed 88 characters. Splitting the long strings and signature clears the lint errors.
- msg = ( - "❌ --max-concurrent-trials is required to prevent GPU memory conflicts!\n\n" - "Add to your YAML config:\n" - " optimization:\n" - " max_concurrent_trials: 2 # Match your GPU count\n\n" - "Or use CLI: --max-concurrent-trials 2" - ) + msg = ( + "❌ --max-concurrent-trials is required to prevent GPU memory " + "conflicts!\n\n" + "Add to your YAML config:\n" + " optimization:\n" + " max_concurrent_trials: 2 # Match your GPU count\n\n" + "Or use CLI: --max-concurrent-trials 2" + ) - max_concurrent_str = ( - max_concurrent_trials if max_concurrent_trials != float("inf") else "unlimited" - ) + if max_concurrent_trials == float("inf"): + max_concurrent_str = "unlimited" + else: + max_concurrent_str = max_concurrent_trials ... - def _submit_available_trials(self, remaining_trials: int, max_concurrent_trials: float): + def _submit_available_trials( + self, + remaining_trials: int, + max_concurrent_trials: float, + ):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
auto_tune_vllm/cli/main.py(9 hunks)auto_tune_vllm/core/config.py(1 hunks)auto_tune_vllm/core/study_controller.py(5 hunks)docs/quick_start.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T20:40:13.235Z
Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.
Applied to files:
auto_tune_vllm/core/config.pyauto_tune_vllm/core/study_controller.py
🧬 Code graph analysis (1)
auto_tune_vllm/cli/main.py (1)
auto_tune_vllm/core/study_controller.py (1)
run_optimization(460-611)
🪛 GitHub Actions: lint-ci
auto_tune_vllm/core/study_controller.py
[error] 482-482: E501 Line too long (93 > 88)
[error] 482-482: Ruff check failed: Line exceeds maximum line length. Please wrap or refactor.
🪛 GitHub Check: ruff
auto_tune_vllm/core/study_controller.py
[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)
[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)
[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)
| # Validate n_startup_trials < n_trials to ensure the sampler algorithm runs | ||
| # This applies to all samplers with startup trials (TPE, GP, BoTorch) | ||
| if n_startup_trials >= n_trials: | ||
| suggestion = max(1, n_trials // 10) | ||
| min_trials = n_startup_trials + 1 | ||
| msg = ( | ||
| f"n_startup_trials ({n_startup_trials}) must be less than " | ||
| f"n_trials ({n_trials}). Otherwise all trials would be random. " | ||
| f"Suggestion: Set n_startup_trials to {suggestion} " | ||
| f"or increase n_trials to at least {min_trials}." | ||
| ) | ||
| raise ValueError(msg) | ||
|
|
||
| # Log sampler configuration | ||
| logger.info( | ||
| f"Creating {sampler_name.upper()} sampler " | ||
| f"(n_startup_trials={n_startup_trials}, n_trials={n_trials})" | ||
| ) | ||
|
|
||
| if sampler_name == "tpe": | ||
| return TPESampler() | ||
| # TPESampler uses random sampling for first n_startup_trials | ||
| return TPESampler(n_startup_trials=n_startup_trials) | ||
| elif sampler_name == "random": | ||
| # RandomSampler is always random, no startup trials concept | ||
| return RandomSampler() | ||
| elif sampler_name == "gp": | ||
| return GPSampler() | ||
| # GPSampler uses random sampling for initial trials | ||
| return GPSampler(n_startup_trials=n_startup_trials) | ||
| elif sampler_name == "botorch": | ||
| return optuna.integration.BoTorchSampler() | ||
| # BoTorchSampler uses random sampling for initial trials | ||
| return optuna.integration.BoTorchSampler(n_startup_trials=n_startup_trials) | ||
| elif sampler_name == "nsga2": | ||
| # NSGA2 is a genetic algorithm, no startup trials concept | ||
| return NSGAIISampler() | ||
| elif sampler_name == "grid": | ||
| # Build search space for grid sampler | ||
| # GridSampler is deterministic, no startup trials concept | ||
| search_space = StudyController._create_search_space(config) | ||
| grid_size = StudyController._calculate_grid_size(search_space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit startup-trial validation to the samplers that use it.
We now raise whenever n_startup_trials >= n_trials, but that fires before we know whether the sampler even consumes startup trials. With the new default of 10, any config that sets n_trials ≤ 10 now fails—for example a quick random or grid search with five trials, which previously worked because those samplers ignore n_startup_trials. That’s a regression. Please gate the check to samplers that actually need it (TPE, GP, BoTorch) or otherwise adjust the default before validating.
- # Validate n_startup_trials < n_trials to ensure the sampler algorithm runs
- # This applies to all samplers with startup trials (TPE, GP, BoTorch)
- if n_startup_trials >= n_trials:
+ # Validate only for samplers that honor startup trials (TPE, GP, BoTorch)
+ samplers_requiring_startup = {"tpe", "gp", "botorch"}
+ if sampler_name in samplers_requiring_startup and n_startup_trials >= n_trials:
suggestion = max(1, n_trials // 10)
min_trials = n_startup_trials + 1
msg = (
f"n_startup_trials ({n_startup_trials}) must be less than "
f"n_trials ({n_trials}). Otherwise all trials would be random. "
f"Suggestion: Set n_startup_trials to {suggestion} "
f"or increase n_trials to at least {min_trials}."
)
raise ValueError(msg)🤖 Prompt for AI Agents
auto_tune_vllm/core/study_controller.py lines 401-438: The current code
validates n_startup_trials >= n_trials for all samplers, causing valid samplers
that ignore startup trials (e.g., random, grid, nsga2) to fail; move or gate
this validation so it only runs for samplers that consume startup trials (tpe,
gp, botorch): check sampler_name first (or perform the check inside the
corresponding branches) and raise the ValueError only for those samplers,
leaving random/grid/nsga2/grid unaffected; keep the sampler_info log and
creation logic otherwise unchanged.
thameem-abbas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the quick start instruction is updated as code rabbit suggests
thameem-abbas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be rebased on main after merging the PR for startup trials
Rename the max_concurrent parameter to max_concurrent_trials to clarify that it controls the number of Optuna optimization trials running in parallel, not GuideLLM's internal concurrency settings.
285f7aa to
b01b775
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
auto_tune_vllm/cli/main.py(9 hunks)auto_tune_vllm/core/config.py(1 hunks)auto_tune_vllm/core/study_controller.py(4 hunks)docs/quick_start.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/quick_start.md
- auto_tune_vllm/cli/main.py
- auto_tune_vllm/core/config.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T20:40:13.235Z
Learnt from: ephoris
Repo: openshift-psap/auto-tuning-vllm PR: 104
File: auto_tune_vllm/core/study_controller.py:605-609
Timestamp: 2025-10-20T20:40:13.235Z
Learning: In the auto-tuning-vllm codebase, configuration validation is performed at entry points before reaching execution paths like trial submission in study_controller.py. Defensive try-except blocks around config value parsing (e.g., VLLM_STARTUP_TIMEOUT) are unnecessary and considered bloat since validation has already occurred upstream.
Applied to files:
auto_tune_vllm/core/study_controller.py
🪛 GitHub Actions: lint-ci
auto_tune_vllm/core/study_controller.py
[error] 482-482: Ruff: E501 Line too long (93 > 88).
🪛 GitHub Check: ruff
auto_tune_vllm/core/study_controller.py
[failure] 493-493: Ruff (E501)
auto_tune_vllm/core/study_controller.py:493:89: E501 Line too long (91 > 88)
[failure] 482-482: Ruff (E501)
auto_tune_vllm/core/study_controller.py:482:88: E501 Line too long (93 > 88)
[failure] 613-613: Ruff (E501)
auto_tune_vllm/core/study_controller.py:613:89: E501 Line too long (92 > 88)
@thameem-abbas Yes, it was. I have done a rebase and some ruff checks. Should be okay now? |
thameem-abbas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
Summary by CodeRabbit
Refactor
Documentation