Skip to content

Conversation

@littleEast7
Copy link
Collaborator

NO JIRA

@littleEast7 littleEast7 requested a review from Copilot January 4, 2026 08:44
@littleEast7 littleEast7 self-assigned this Jan 4, 2026
@littleEast7 littleEast7 added the enhancement New feature or request label Jan 4, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for bypassing Alluxio and directly accessing the underlying file systems (UFS) by introducing a new skip_alluxio parameter. This enables use cases where Alluxio caching is not desired or available, while still leveraging the UFS access capabilities.

Key Changes:

  • Introduced UFSManager as a unified interface for managing UFS access, supporting both Alluxio-based and configuration-based UFS registration
  • Added LocalUFSUpdater to parse and register UFS configurations directly without requiring an Alluxio client
  • Refactored UFSUpdater to extend BaseUFSUpdater for better code reuse

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
alluxiofs/core.py Integrated skip_alluxio parameter, replaced UFSUpdater with UFSManager, refactored fallback execution logic into separate methods, added @fallback_handler decorator to open method
alluxiofs/client/core.py Renamed data_manager to local_cache_manager for clarity
alluxiofs/client/ufs_manager.py Introduced BaseUFSUpdater, LocalUFSUpdater, and UFSManager classes; refactored UFSUpdater to extend base class
alluxiofs/client/utils.py Enhanced get_protocol_from_path to support protocol-only strings (e.g., "s3") in addition to full URIs
tests/local_cache/test_utils.py Added test cases for updated get_protocol_from_path and convert_ufs_info_to functions
tests/local_cache/test_ufs_manager.py Added comprehensive tests for LocalUFSUpdater including config parsing and validation
setup.py Version bump from 1.1.18rc1 to 1.1.18rc3

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +344 to +348
ufs_full_path = ufs_full_path[:-1]

ufs_info = UfsInfo(
alluxio_path=ufs_full_path,
ufs_full_path=ufs_full_path,
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop variable ufs_full_path is reassigned inside the loop body (line 344), which modifies the value used in subsequent operations. While this works correctly in this case, it's a maintainability concern as it can be confusing to readers. Consider using a different variable name for the modified path (e.g., normalized_path) to make the code clearer and avoid potential bugs if the original value is needed later.

Suggested change
ufs_full_path = ufs_full_path[:-1]
ufs_info = UfsInfo(
alluxio_path=ufs_full_path,
ufs_full_path=ufs_full_path,
normalized_alluxio_path = ufs_full_path[:-1]
else:
normalized_alluxio_path = ufs_full_path
ufs_info = UfsInfo(
alluxio_path=normalized_alluxio_path,
ufs_full_path=normalized_alluxio_path,

Copilot uses AI. Check for mistakes.
Comment on lines 142 to 152
client = AlluxioClient(**kwargs)
self.ufs_updater = UFSUpdater(client)
self.ufs_updater.start_updater()
self.alluxio = None if skip_alluxio else client
if self.skip_alluxio is True:
ufs_config = kwargs.get("ufs_config", {})
if not ufs_config:
raise ValueError(
"ufs_config must be provided when skip_alluxio is True."
)
self.ufs_manager = UFSManager(config=ufs_config)
self.ufs_manager.initialize_ufs_manager()
self.alluxio = None
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skip_alluxio is True, an AlluxioClient is still being instantiated even though it will never be used (self.alluxio is set to None on line 152). This is inefficient and could cause initialization failures or unnecessary resource allocation. Consider moving the client instantiation inside the else block (after line 153) so it's only created when actually needed.

Copilot uses AI. Check for mistakes.
)
local_cache = LocalCacheManager(self.config)
self.data_manager = CachedFileReader(
self.local_cache_manager = CachedFileReader(
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is being renamed from data_manager to local_cache_manager, but this appears to be an incomplete refactoring. The codebase still has many references to data_manager (e.g., at lines 546, 574, 638, 932, 934, 952, 987, 1255, 1414 in this file). This will cause AttributeError exceptions when those code paths are executed. All references to data_manager should be updated to local_cache_manager to complete this refactoring.

Copilot uses AI. Check for mistakes.
def get_protocol_from_path(self, path):
return get_protocol_from_path(path)

def register_ufs_fallback(self, ufs_info_list: list[UfsInfo]):
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint list[UfsInfo] uses the modern Python 3.9+ syntax. For compatibility with older Python versions (3.7-3.8), consider using List[UfsInfo] from the typing module instead, which is already imported at the top of the file. This ensures the code works across a wider range of Python versions unless the project explicitly requires Python 3.9+.

Copilot uses AI. Check for mistakes.
Comment on lines +356 to +412
class UFSManager:
"""
Class responsible for managing Ufs Info.
"""

def __init__(self, alluxio=None, config: Dict = None):
self.ufs_updater: Optional[BaseUFSUpdater] = None
if alluxio is not None:
self.ufs_updater = UFSUpdater(alluxio)
elif config is not None:
self.ufs_updater = LocalUFSUpdater(config)

def initialize_ufs_manager(self):
"""
Initialize UFS Manager with UFS Updater.
"""
if self.ufs_updater:
self.ufs_updater.start_updater()

def shutdown_ufs_manager(self):
"""
Shutdown UFS Manager and stop the UFS Updater.
"""
if self.ufs_updater:
self.ufs_updater.stop_updater()

def must_get_ufs_count(self):
if self.ufs_updater:
return self.ufs_updater.must_get_ufs_count()
return 0

def must_get_ufs_from_path(self, path: str):
if self.ufs_updater:
return self.ufs_updater.must_get_ufs_from_path(path)
return None

def must_get_alluxio_path_from_ufs_full_path(self, path: str):
if self.ufs_updater:
return self.ufs_updater.must_get_alluxio_path_from_ufs_full_path(
path
)
return None

def get_ufs_count(self):
if self.ufs_updater:
return self.ufs_updater.get_ufs_count()
return 0

def get_ufs_from_cache(self, path: str):
if self.ufs_updater:
return self.ufs_updater.get_ufs_from_cache(path)
return None

def get_alluxio_path_from_ufs_full_path(self, path: str):
if self.ufs_updater:
return self.ufs_updater.get_alluxio_path_from_ufs_full_path(path)
return None
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new UFSManager class lacks test coverage. While LocalUFSUpdater and the existing UFSUpdater have tests, there are no tests for the UFSManager wrapper class itself. Consider adding tests that verify the initialization with both alluxio and config parameters, the delegation to the underlying updater, and the lifecycle methods (initialize_ufs_manager, shutdown_ufs_manager).

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 156
test_skip_alluxio = test_options.get("skip_alluxio") is True
self.skip_alluxio = kwargs.get("skip_alluxio")
client = AlluxioClient(**kwargs)
self.ufs_updater = UFSUpdater(client)
self.ufs_updater.start_updater()
self.alluxio = None if skip_alluxio else client
if self.skip_alluxio is True:
ufs_config = kwargs.get("ufs_config", {})
if not ufs_config:
raise ValueError(
"ufs_config must be provided when skip_alluxio is True."
)
self.ufs_manager = UFSManager(config=ufs_config)
self.ufs_manager.initialize_ufs_manager()
self.alluxio = None
else:
self.ufs_manager = UFSManager(alluxio=client)
self.ufs_manager.initialize_ufs_manager()
self.alluxio = None if test_skip_alluxio else client
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new skip_alluxio parameter and its integration with ufs_config lack test coverage. While tests exist for test_options={"skip_alluxio": True}, there are no tests verifying: 1) the behavior when skip_alluxio=True is passed as a top-level parameter, 2) that ufs_config is properly validated and used, 3) that operations are correctly routed to UFS when skip_alluxio is True, and 4) that the error is raised when ufs_config is missing. Consider adding integration tests for this new functionality.

Copilot uses AI. Check for mistakes.
Comment on lines 142 to 156
client = AlluxioClient(**kwargs)
self.ufs_updater = UFSUpdater(client)
self.ufs_updater.start_updater()
self.alluxio = None if skip_alluxio else client
if self.skip_alluxio is True:
ufs_config = kwargs.get("ufs_config", {})
if not ufs_config:
raise ValueError(
"ufs_config must be provided when skip_alluxio is True."
)
self.ufs_manager = UFSManager(config=ufs_config)
self.ufs_manager.initialize_ufs_manager()
self.alluxio = None
else:
self.ufs_manager = UFSManager(alluxio=client)
self.ufs_manager.initialize_ufs_manager()
self.alluxio = None if test_skip_alluxio else client
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skip_alluxio is True, an AlluxioClient is still being instantiated on line 143 even though it will not be used. This is inefficient and could cause initialization failures or unnecessary resource allocation. The client should only be instantiated when it's actually needed (i.e., when skip_alluxio is False or when test_skip_alluxio is False).

Copilot uses AI. Check for mistakes.
Comment on lines 86 to 88
if "/" in path or "\\" in path:
return None
return path
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic at lines 86-88 returns the entire path as a protocol if it doesn't contain "://" and doesn't contain "/" or "\". This means get_protocol_from_path("s3") would return "s3", which seems correct for protocol-only strings. However, this could incorrectly identify other strings without slashes as protocols (e.g., a filename like "file.txt" would be considered a protocol). Consider adding validation to ensure the returned value is a known protocol, or document this behavior clearly.

Copilot uses AI. Check for mistakes.
import fsspec
from fsspec import filesystem

from alluxiofs import AlluxioClient
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import creates a circular dependency: alluxiofs/__init__.py imports from alluxiofs.core, which imports from alluxiofs.client.ufs_manager, which now imports from alluxiofs (top-level). While Python can sometimes handle circular imports if they're structured carefully, this is fragile and can cause import failures depending on the import order. Consider importing directly from alluxiofs.client.core instead: from alluxiofs.client.core import AlluxioClient.

Suggested change
from alluxiofs import AlluxioClient
from alluxiofs.client.core import AlluxioClient

Copilot uses AI. Check for mistakes.
super().__init__()
assert (
isinstance(alluxio, AlluxioClient) or alluxio is None
), "alluxio must be an instance of AlluxioClient or None"
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion message could be more descriptive. Instead of just stating what the type should be, it should explain why this is a requirement or what the consequence of passing the wrong type would be. This helps developers understand the constraint better when debugging assertion failures.

Suggested change
), "alluxio must be an instance of AlluxioClient or None"
), (
"alluxio must be an instance of AlluxioClient or None so that "
"UFSUpdater can access the Alluxio configuration and correctly "
"initialize background UFS info refresh; passing any other type "
"will prevent proper setup of periodic updates and logging."
)

Copilot uses AI. Check for mistakes.
@littleEast7 littleEast7 merged commit 7b2ebb4 into fsspec:main Jan 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant