-
Notifications
You must be signed in to change notification settings - Fork 10
Refactor ufs manager #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ufs manager #101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the UFS (Underlying File System) manager by extracting logging utilities into a dedicated module, improving code organization, and making several bug fixes and enhancements to the fallback mechanism.
Key changes:
- Extracted logging utilities (TagAdapter, TagFilter, setup_logger) from
utils.pyto a new dedicatedlog.pymodule - Added
parameters_adapterfunction to handle filesystem-specific parameter transformations - Refactored client initialization logic to reduce duplication and improve clarity
- Improved method naming in client/core.py for better code readability
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| alluxiofs/client/log.py | New module containing extracted logging utilities (TagAdapter, TagFilter, setup_logger) |
| alluxiofs/client/utils.py | Removed logging utilities and added parameters_adapter function for filesystem-specific parameter handling |
| alluxiofs/client/ufs_manager.py | Updated imports to use new log module, added path normalization logic, and integrated convert_ufs_info_to |
| alluxiofs/core.py | Refactored AlluxioClient initialization, improved fallback logic with VAR_KEYWORD parameter handling, added duplicate exception handler, and fixed AlluxioFile initialization |
| alluxiofs/client/core.py | Renamed methods for clarity, added S3_RANGE_URL_FORMAT constant usage, and created new _send_get_range_message_and_write_to_file method |
| alluxiofs/client/const.py | Added S3_RANGE_URL_FORMAT constant |
| alluxiofs/client/prefetch_policy.py | Updated imports to use new log module |
| alluxiofs/client/loadbalance.py | Updated imports to use new log module |
| alluxiofs/client/cache.py | Updated imports to use new log module |
| tests/local_cache/test_utils.py | Updated test imports and mock patches to reflect log module extraction, improved test assertion for HTTP errors |
| tests/local_cache/test_ufs_manager.py | Updated mock patches to reflect module rename from transfer to ufs_manager |
| config_template.yaml | Enhanced documentation with detailed log tag descriptions and examples |
Comments suppressed due to low confidence (1)
alluxiofs/core.py:669
- The code assumes 'alluxio' is not None when accessing 'alluxio.fallback_logger' and 'alluxio.alluxio'. However, on line 650, there's a check 'if alluxio:' suggesting that alluxio can be None. If alluxio is None, lines 658-669 will raise AttributeError when trying to access alluxio.fallback_logger and alluxio.alluxio. Add a null check or ensure alluxio is never None.
self.fallback_logger = alluxio.fallback_logger
if alluxio.alluxio and alluxio.alluxio.config.local_cache_enabled:
block_size = (
int(alluxio.alluxio.config.local_cache_block_size_mb)
* 1024
* 1024
)
self.prefetch_policy = get_prefetch_policy(
alluxio.alluxio.config, block_size
)
else:
self.prefetch_policy = None
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alluxiofs/client/core.py
Outdated
| error=f"Error when reading file {alluxio_path} with offset {offset} and length {length}," | ||
| f" error: {e}", | ||
| ) | ||
| ) | ||
|
|
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message formatting is broken across two lines (1529-1530) without proper concatenation. This will result in a string that includes a line break and indentation in the middle. The f-string should be written on a single line or use parentheses for proper line continuation.
| error=f"Error when reading file {alluxio_path} with offset {offset} and length {length}," | |
| f" error: {e}", | |
| ) | |
| ) | |
| error=f"Error when reading file {alluxio_path} with offset {offset} and length {length}, error: {e}", | |
| ) | |
| ) |
| def _send_get_range_message_and_return_bytes( | ||
| self, worker_host, worker_port, alluxio_path, offset, length |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name '_send_get_range_message_and_return_bytes' is verbose and inconsistent with the original naming pattern. The old name '_all_file_range_generator' was misleading (it doesn't generate anything), but the new name could be simplified to something like '_read_range_bytes' or '_fetch_range' for better readability and consistency with similar methods.
| def _send_get_range_message_and_return_bytes_alluxiocommon( | ||
| self, worker_host, worker_http_port, file_path, offset, length |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name '_send_get_range_message_and_return_bytes_alluxiocommon' is excessively verbose. Consider simplifying to something like '_read_range_bytes_common' or '_fetch_range_common' for better readability.
alluxiofs/core.py
Outdated
| if ( | ||
| self.fs.fallback_to_ufs_enabled | ||
| and self.f is None | ||
| or not self.fs.fallback_to_ufs_enabled | ||
| ): |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition logic here is incorrect. The expression self.fs.fallback_to_ufs_enabled and self.f is None or not self.fs.fallback_to_ufs_enabled evaluates incorrectly due to operator precedence. It should be: (self.fs.fallback_to_ufs_enabled and self.f is None) or (not self.fs.fallback_to_ufs_enabled). Currently, due to operator precedence, this is evaluated as (self.fs.fallback_to_ufs_enabled and self.f is None) or (not self.fs.fallback_to_ufs_enabled), but the lack of explicit parentheses makes this confusing and error-prone. Additionally, the logic appears backwards - if fallback is enabled and we have a valid file handle (self.f is not None), we should use it, not raise an exception.
| if ( | |
| self.fs.fallback_to_ufs_enabled | |
| and self.f is None | |
| or not self.fs.fallback_to_ufs_enabled | |
| ): | |
| # Re-raise the original exception if fallback is disabled, or if | |
| # fallback is enabled but opening the UFS file failed. | |
| if (not self.fs.fallback_to_ufs_enabled) or self.f is None: |
alluxiofs/client/utils.py
Outdated
| adapted_kwargs["src_path"] = final_kwargs.pop("path1") | ||
| adapted_kwargs["target_path"] = final_kwargs.pop("path2") |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 'pop' on 'final_kwargs' when the key might not exist ('mode' may not always be present) could raise a KeyError. The default value 'overwrite' is provided, which is good, but when using 'pop' on 'path1' and 'path2' on lines 43-44, there's no default value, which will raise KeyError if these keys don't exist. Consider using 'get' with defaults or adding existence checks.
| adapted_kwargs["src_path"] = final_kwargs.pop("path1") | |
| adapted_kwargs["target_path"] = final_kwargs.pop("path2") | |
| if "path1" in final_kwargs and "path2" in final_kwargs: | |
| adapted_kwargs["src_path"] = final_kwargs.pop("path1") | |
| adapted_kwargs["target_path"] = final_kwargs.pop("path2") |
alluxiofs/core.py
Outdated
| except Exception as e: | ||
| self.fallback_logger.error( | ||
| f"Fallback to UFS failed for {method_name}" | ||
| ) | ||
| # [Critical] Use 'from e' to preserve the original exception stack trace | ||
| raise RuntimeError( | ||
| f"Fallback to UFS failed for {method_name}" | ||
| ) from e |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate exception handler detected. Lines 384-391 and 392-399 are identical except handlers. The second except block (392-399) will never be reached because the first one (384-391) catches all exceptions. This is unreachable code that should be removed.
| def _send_get_range_message_and_write_to_file( | ||
| self, f, worker_host, worker_port, alluxio_path, offset, length |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name '_send_get_range_message_and_write_to_file' is verbose. Consider simplifying to something like '_fetch_range_to_file' or '_write_range_to_file' for better readability and consistency with similar methods.
| headers = {} | ||
| mode = adapted_kwargs.pop("mode", "overwrite") | ||
| if mode == "overwrite": | ||
| # Default behavior, explicitly allowing overwrite (optional, usually not needed) | ||
| headers["x-oss-forbid-overwrite"] = "false" | ||
| else: # Or whatever logic you use for "don't overwrite" | ||
| # This will cause the upload to fail if the file exists | ||
| headers["x-oss-forbid-overwrite"] = "true" |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'headers' dictionary is created but never used. After setting the headers based on the mode, they should be passed to the appropriate OSS API call or stored in adapted_kwargs. Currently, this code has no effect.
config_template.yaml
Outdated
| # log_dir: Log directory | ||
| # Purpose: Directory to store log files. If not set, logs might be output to console or default location. | ||
| # Default Value: null | ||
| # Default Value: null,meaning no specific log directory is set,only console output |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chinese punctuation (full-width comma ',') is used instead of standard ASCII comma. Replace ',' with ',' for consistency with English text.
| # Default Value: null,meaning no specific log directory is set,only console output | |
| # Default Value: null, meaning no specific log directory is set, only console output |
| error=f"Error when reading file {alluxio_path} with offset {offset} and length {length}," | ||
| f" error: {e}", |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message formatting is broken across two lines (1502-1503) without proper concatenation. This will result in a string that includes a line break and indentation in the middle. The f-string should be written on a single line or use parentheses for proper line continuation.
NO JIRA