Skip to content

Conversation

@littleEast7
Copy link
Collaborator

NO JIRA

@littleEast7 littleEast7 self-assigned this Dec 19, 2025
@littleEast7 littleEast7 added the bug Something isn't working label Dec 19, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the adaptive prefetch policy where the tracking of the last read position was incorrect, and threads the prefetch policy parameter through the read path to allow per-file policy instances. Additionally, it changes the default log_dir to None to disable file logging by default.

  • Fixed adaptive prefetch policy to track last_end_block instead of last_offset and properly update it after each read
  • Threaded prefetch_policy parameter through read_file_range and related methods to support per-file policy instances
  • Changed default log_dir from "./logs/" to None to disable file logging unless explicitly configured

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
config_template.yaml Added blank line for formatting
alluxiofs/core.py Modified _open to return tuple of (BufferedReader, raw_file); added prefetch_policy initialization in AlluxioFile and threading through read_file_range; added get_local_cache_prefetch_window_size method
alluxiofs/client/prefetch_policy.py Fixed adaptive prefetch bug by changing from last_offset to last_end_block tracking; added get_windows_size method to both FixedWindow and AdaptiveWindow policies
alluxiofs/client/core.py Added prefetch_policy parameter to read_file_range method; renamed data_manager variable to local_cache for clarity
alluxiofs/client/config.py Changed log_dir default from "./logs/" to None and updated validation to allow None values
alluxiofs/client/cache.py Added prefetch_policy parameter threading through read_file_range and related methods; added offset/length tracking in read loop; added get_blocks_prefetch method to use policy from parameter if provided

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.prefetch_ahead = self.min_prefetch
self.last_offset = offset

def get_windows_size(self):
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is named get_windows_size (plural "windows") instead of get_window_size (singular "window"). Based on the context where it returns the prefetch_ahead value representing a single window size, the singular form would be more accurate and consistent with the variable name "prefetch_window".

Suggested change
def get_windows_size(self):
def get_window_size(self):

Copilot uses AI. Check for mistakes.

def get_local_cache_prefetch_window_size(self):
if self.fs.alluxio and self.fs.alluxio.config.local_cache_enabled:
return self.prefetch_policy.get_windows_size()
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name should be get_window_size (singular) to match the naming used in this method call. This is inconsistent with the method definition which uses the plural form get_windows_size.

Suggested change
return self.prefetch_policy.get_windows_size()
return self.prefetch_policy.get_window_size()

Copilot uses AI. Check for mistakes.
self.logger = TagAdapter(base_logger, {"tag": "[PREFETCH]"})
self.min_prefetch = 0
self.last_offset = 0
self.last_end_block = 0
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing test file tests/local_cache/test_prefetch_policy.py uses last_offset which has been renamed to last_end_block in this PR. The tests at lines 127 and 144 directly set policy.last_offset which will fail since this attribute no longer exists. These tests need to be updated to use last_end_block instead.

Copilot uses AI. Check for mistakes.
Comment on lines +747 to +749
if chunk is not None:
offset += len(chunk)
length -= len(chunk)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset and length variables are being modified during the loop (lines 748-749), but these same variables are used in subsequent iterations for _parallel_download_file (lines 726-727) and the fallback (lines 744-746). This causes incorrect behavior:

  1. If _parallel_download_file is called in iteration 2 or later, it receives modified offset/length values instead of the original request range, causing it to calculate the wrong blocks for prefetching.

  2. If fallback occurs after some chunks have been successfully read, it uses modified offset/length but returns immediately without including the already-collected chunks, resulting in data loss.

Consider using separate variables (e.g., current_offset, current_length) to track position, or saving the original offset/length at the start of the function.

Copilot uses AI. Check for mistakes.
Comment on lines 471 to 474
return (
io.BufferedReader(raw_file, buffer_size=_read_buffer_size),
raw_file,
)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _open method now returns a tuple (BufferedReader, raw_file) instead of just the BufferedReader. This breaks the fsspec API contract where _open should return a file-like object. The AbstractFileSystem.open() method calls _open() and expects a single file object to be returned, not a tuple. This change will cause the open() method to return a tuple instead of a file object, breaking all code that calls alluxio.open().

Suggested change
return (
io.BufferedReader(raw_file, buffer_size=_read_buffer_size),
raw_file,
)
return io.BufferedReader(raw_file, buffer_size=_read_buffer_size)

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +86
def get_windows_size(self):
return self.prefetch_ahead

Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is named get_windows_size (plural "windows") instead of get_window_size (singular "window"). Based on the context where it returns the prefetch_ahead value representing a single window size, the singular form would be more accurate and consistent with the variable name "prefetch_window".

Suggested change
def get_windows_size(self):
return self.prefetch_ahead
def get_window_size(self):
"""Return the prefetch window size."""
return self.prefetch_ahead
def get_windows_size(self):
"""Deprecated: use get_window_size instead."""
return self.get_window_size()

Copilot uses AI. Check for mistakes.
@littleEast7 littleEast7 merged commit 8ada8d0 into fsspec:main Dec 23, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant