-
Notifications
You must be signed in to change notification settings - Fork 10
fix the bug of prefetch policy #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the adaptive prefetch policy where the tracking of the last read position was incorrect, and threads the prefetch policy parameter through the read path to allow per-file policy instances. Additionally, it changes the default log_dir to None to disable file logging by default.
- Fixed adaptive prefetch policy to track
last_end_blockinstead oflast_offsetand properly update it after each read - Threaded
prefetch_policyparameter throughread_file_rangeand related methods to support per-file policy instances - Changed default
log_dirfrom "./logs/" toNoneto disable file logging unless explicitly configured
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| config_template.yaml | Added blank line for formatting |
| alluxiofs/core.py | Modified _open to return tuple of (BufferedReader, raw_file); added prefetch_policy initialization in AlluxioFile and threading through read_file_range; added get_local_cache_prefetch_window_size method |
| alluxiofs/client/prefetch_policy.py | Fixed adaptive prefetch bug by changing from last_offset to last_end_block tracking; added get_windows_size method to both FixedWindow and AdaptiveWindow policies |
| alluxiofs/client/core.py | Added prefetch_policy parameter to read_file_range method; renamed data_manager variable to local_cache for clarity |
| alluxiofs/client/config.py | Changed log_dir default from "./logs/" to None and updated validation to allow None values |
| alluxiofs/client/cache.py | Added prefetch_policy parameter threading through read_file_range and related methods; added offset/length tracking in read loop; added get_blocks_prefetch method to use policy from parameter if provided |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alluxiofs/client/prefetch_policy.py
Outdated
| self.prefetch_ahead = self.min_prefetch | ||
| self.last_offset = offset | ||
|
|
||
| def get_windows_size(self): |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is named get_windows_size (plural "windows") instead of get_window_size (singular "window"). Based on the context where it returns the prefetch_ahead value representing a single window size, the singular form would be more accurate and consistent with the variable name "prefetch_window".
| def get_windows_size(self): | |
| def get_window_size(self): |
alluxiofs/core.py
Outdated
|
|
||
| def get_local_cache_prefetch_window_size(self): | ||
| if self.fs.alluxio and self.fs.alluxio.config.local_cache_enabled: | ||
| return self.prefetch_policy.get_windows_size() |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name should be get_window_size (singular) to match the naming used in this method call. This is inconsistent with the method definition which uses the plural form get_windows_size.
| return self.prefetch_policy.get_windows_size() | |
| return self.prefetch_policy.get_window_size() |
| self.logger = TagAdapter(base_logger, {"tag": "[PREFETCH]"}) | ||
| self.min_prefetch = 0 | ||
| self.last_offset = 0 | ||
| self.last_end_block = 0 |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing test file tests/local_cache/test_prefetch_policy.py uses last_offset which has been renamed to last_end_block in this PR. The tests at lines 127 and 144 directly set policy.last_offset which will fail since this attribute no longer exists. These tests need to be updated to use last_end_block instead.
| if chunk is not None: | ||
| offset += len(chunk) | ||
| length -= len(chunk) |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset and length variables are being modified during the loop (lines 748-749), but these same variables are used in subsequent iterations for _parallel_download_file (lines 726-727) and the fallback (lines 744-746). This causes incorrect behavior:
-
If _parallel_download_file is called in iteration 2 or later, it receives modified offset/length values instead of the original request range, causing it to calculate the wrong blocks for prefetching.
-
If fallback occurs after some chunks have been successfully read, it uses modified offset/length but returns immediately without including the already-collected chunks, resulting in data loss.
Consider using separate variables (e.g., current_offset, current_length) to track position, or saving the original offset/length at the start of the function.
alluxiofs/core.py
Outdated
| return ( | ||
| io.BufferedReader(raw_file, buffer_size=_read_buffer_size), | ||
| raw_file, | ||
| ) |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _open method now returns a tuple (BufferedReader, raw_file) instead of just the BufferedReader. This breaks the fsspec API contract where _open should return a file-like object. The AbstractFileSystem.open() method calls _open() and expects a single file object to be returned, not a tuple. This change will cause the open() method to return a tuple instead of a file object, breaking all code that calls alluxio.open().
| return ( | |
| io.BufferedReader(raw_file, buffer_size=_read_buffer_size), | |
| raw_file, | |
| ) | |
| return io.BufferedReader(raw_file, buffer_size=_read_buffer_size) |
| def get_windows_size(self): | ||
| return self.prefetch_ahead | ||
|
|
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is named get_windows_size (plural "windows") instead of get_window_size (singular "window"). Based on the context where it returns the prefetch_ahead value representing a single window size, the singular form would be more accurate and consistent with the variable name "prefetch_window".
| def get_windows_size(self): | |
| return self.prefetch_ahead | |
| def get_window_size(self): | |
| """Return the prefetch window size.""" | |
| return self.prefetch_ahead | |
| def get_windows_size(self): | |
| """Deprecated: use get_window_size instead.""" | |
| return self.get_window_size() |
NO JIRA