-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[rollout,docs] fix: improve error message (#4682) and docstrings (#1345) #4729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rollout,docs] fix: improve error message (#4682) and docstrings (#1345) #4729
Conversation
…olcengine#4682) The vLLMAsyncRollout.generate_sequences() method now provides a clear error message explaining: - Sync mode was retired in PR volcengine#4411 - Users should use async server interface (vLLMReplica, AsyncLLMServerManager) - HFRollout can be used for synchronous generation Also updated generation.yaml config to use async mode and document the current limitation with main_generation.py workflow. Fixes volcengine#4682 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…olcengine#1345) Standardized all function docstrings in verl/utils/device.py to follow Google-style documentation format with proper Args, Returns, and Note sections: - is_torch_npu_available(): Added detailed description and return type - get_visible_devices_keyword(): Clarified purpose and return values - get_device_name(): Improved description of supported devices - get_torch_device(): Documented fallback behavior - get_device_id(): Concise description with example - get_nccl_backend(): Explained HCCL vs NCCL selection - set_expandable_segments(): Added OOM context and Note section - auto_set_ascend_device_name(): Documented NPU auto-configuration - get_device_capability(): Added proper type hints and description Contributes to volcengine#1345 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request delivers two valuable improvements. Firstly, it significantly enhances the developer experience by providing a much more informative error message for the deprecated synchronous generate_sequences method in vLLMAsyncRollout. This will prevent confusion and guide users toward the correct asynchronous implementation. Secondly, the standardization of docstrings in verl/utils/device.py to the Google style guide is a great step towards better code clarity and maintainability. I have added one high-severity comment to address a regression in type hinting within device.py to ensure consistency with the improvements being made. Overall, this is a solid contribution.
|
|
||
| def get_torch_device() -> any: | ||
| """Return the corresponding torch attribute based on the device type string. | ||
| def get_torch_device(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the previous type hint -> any was incorrect (it should be -> Any), removing the type hint altogether is a regression, especially in a PR that aims to improve documentation and typing. To maintain consistency with the other functions in this file and improve code clarity for developers and static analysis tools, a correct return type hint should be provided.
The function returns a module object (e.g., torch.cuda), so types.ModuleType is the most appropriate type hint. You will need to add import types at the top of the file.
| def get_torch_device(): | |
| def get_torch_device() -> "types.ModuleType": |
…rings (volcengine#1345) (volcengine#4729) ## Summary This PR contains two contributions: ### 1. Fix for Issue volcengine#4682 - Informative error message for `generate_sequences` - **Problem:** `vLLMAsyncRollout.generate_sequences()` raised a bare `NotImplementedError`, leaving users confused when running generation scripts - **Root cause:** The vLLM SPMD (sync) mode was retired in PR volcengine#4411, but the generation workflow (`main_generation.py`) still expects a synchronous `generate_sequences()` method - **Fix:** Added an informative error message explaining: - Sync mode was retired in PR volcengine#4411 - Users should use the async server interface (`vLLMReplica`, `AsyncLLMServerManager`) - Alternative: use `HFRollout` for synchronous generation - Links to issue volcengine#4682 for details - Also updated `generation.yaml` config comments to document the limitation ### 2. Documentation improvement for Issue volcengine#1345 - Google-style docstrings in `device.py` Standardized all function docstrings in `verl/utils/device.py` to follow Google-style documentation format: - `is_torch_npu_available()`: Added detailed description and return type - `get_visible_devices_keyword()`: Clarified purpose and return values - `get_device_name()`: Improved description of supported devices - `get_torch_device()`: Documented fallback behavior - `get_device_id()`: Concise description with example - `get_nccl_backend()`: Explained HCCL vs NCCL selection - `set_expandable_segments()`: Added OOM context and Note section - `auto_set_ascend_device_name()`: Documented NPU auto-configuration - `get_device_capability()`: Added proper type hints and description ## Test plan - [x] Python syntax verification passed for all modified files - [ ] CI tests should pass (no functional changes, only error messages and docstrings) Fixes volcengine#4682 Contributes to volcengine#1345 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: yurekami <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]>
Summary
This PR contains two contributions:
1. Fix for Issue #4682 - Informative error message for
generate_sequencesvLLMAsyncRollout.generate_sequences()raised a bareNotImplementedError, leaving users confused when running generation scriptsmain_generation.py) still expects a synchronousgenerate_sequences()methodvLLMReplica,AsyncLLMServerManager)HFRolloutfor synchronous generationgeneration.yamlconfig comments to document the limitation2. Documentation improvement for Issue #1345 - Google-style docstrings in
device.pyStandardized all function docstrings in
verl/utils/device.pyto follow Google-style documentation format:is_torch_npu_available(): Added detailed description and return typeget_visible_devices_keyword(): Clarified purpose and return valuesget_device_name(): Improved description of supported devicesget_torch_device(): Documented fallback behaviorget_device_id(): Concise description with exampleget_nccl_backend(): Explained HCCL vs NCCL selectionset_expandable_segments(): Added OOM context and Note sectionauto_set_ascend_device_name(): Documented NPU auto-configurationget_device_capability(): Added proper type hints and descriptionFixes #4682
Contributes to #1345