Fix: Add filename parameter support for ONNX file caching from Hugging Face Hub (Issue #2218)#2386
Open
ada-ggf25 wants to merge 7 commits intohuggingface:mainfrom
Open
Fix: Add filename parameter support for ONNX file caching from Hugging Face Hub (Issue #2218)#2386ada-ggf25 wants to merge 7 commits intohuggingface:mainfrom
ada-ggf25 wants to merge 7 commits intohuggingface:mainfrom
Conversation
…del_files method Add support for downloading specific files from model repositories with custom local filename handling in the TasksManager.get_model_files method. Changes: - Added filename parameter to allow downloading a specific file from the repository instead of listing all files - Added local_filename parameter to specify a custom name for the cached file, useful for repositories with specific naming requirements (e.g., xenova) - Implemented logic to use hf_hub_download via download_file_with_filename utility when filename is specified - Added comprehensive docstring documentation for all parameters and return values - Enhanced error handling for file download operations This enhancement enables more flexible file retrieval from Hugging Face Hub repositories, particularly for cases where custom local filenames are required for compatibility with downstream tools.
Add exports for file utility functions to make them accessible from the optimum.utils module. Changes: - Export download_file_with_filename function for downloading files with custom local filename support - Export find_files_matching_pattern function for pattern-based file searching - Export validate_file_exists function for file existence validation This enables users to import these utilities directly from optimum.utils instead of requiring direct imports from optimum.utils.file_utils.
…l filename support Add new utility function to download files from Hugging Face Hub with support for custom local filenames, enabling better compatibility with repositories that require specific naming conventions. Changes: - Add download_file_with_filename function to file_utils module - Import hf_hub_download from huggingface_hub to support custom local_filename parameter - Implement support for subfolder paths in filename construction - Add comprehensive docstring with parameter descriptions and usage example - Support for optional cache directory, token authentication, and repository type specification This function is particularly useful for repositories like xenova that may have specific naming requirements for cached files, allowing users to download files with custom local filenames while maintaining proper caching behaviour.
Add test suite for file utility functions to ensure proper functionality and edge case handling. Changes: - Add TestDownloadFileWithFilename class with test cases covering: * Downloading files with default filename * Downloading files with custom local filename * Downloading files from subfolders * Downloading files with revision and token parameters * Downloading files from different repository types (model, dataset) - Add TestValidateFileExists class with test cases covering: * Validating file existence in local directories (root and subfolders) * Validating file existence in remote repositories * Handling subfolder paths in remote repositories - Use unittest.mock for mocking Hugging Face Hub API calls - Use tempfile for creating temporary test directories These tests ensure the file utility functions work correctly across different scenarios and provide confidence for future changes.
Clean up test file by removing unused imports and trailing whitespace. Changes: - Remove unused os import - Remove unused pytest import - Remove trailing blank line at end of file This improves code cleanliness and follows best practices by only importing what is actually used in the test file.
Move file_utils imports to be positioned earlier in the file, right after constant imports, to improve import organisation and maintain a more logical grouping of related imports. Changes: - Move file_utils imports from after input_generators to after constant imports - Maintain alphabetical and logical grouping of utility imports This improves code organisation and makes the import structure more consistent and easier to navigate.
…iles Remove unused downloaded_path variable assignment when calling download_file_with_filename, as the return value is not used in the subsequent code. Changes: - Remove unused downloaded_path variable assignment - Keep the function call to maintain download functionality This improves code cleanliness by removing unused variables and follows best practices for code maintenance.
|
This PR has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Add filename parameter support for ONNX file caching from Hugging Face Hub
Fixes #2218
What does this PR do?
This PR fixes issue #2218 by adding support for the
filenameandlocal_filenameparameters when downloading and caching files from Hugging Face Hub repositories. This is particularly important for repositories like xenova that require specific naming conventions for cached ONNX files.Problem
Previously, when downloading files from Hugging Face Hub using
snapshot_download, there was no way to specify a custom local filename for the cached file. This caused issues when users needed to cache files (especially ONNX models) with specific filenames that differ from the original repository filename.Solution
download_file_with_filename()utility function that useshf_hub_downloadinstead ofsnapshot_downloadwhen a specific filename is requestedTasksManager.get_model_files()to acceptfilenameandlocal_filenameparametersfilenameis provided, the function now useshf_hub_downloadwhich supports thelocal_filenameparameter for custom cachingChanges
New function:
optimum.utils.file_utils.download_file_with_filename()hf_hub_downloadwhich properly handles thelocal_filenameparameterEnhanced function:
TasksManager.get_model_files()filenameparameter: when specified, downloads only that specific filelocal_filenameparameter: allows custom naming for cached filesTests: Added comprehensive test suite in
tests/utils/test_file_utils.pyExample Usage
Or using the utility function directly:
Before submitting
Testing
All new tests pass successfully:
Files Changed
optimum/utils/file_utils.py: Addeddownload_file_with_filename()functionoptimum/utils/__init__.py: Exported new function in public APIoptimum/exporters/tasks.py: Enhancedget_model_files()with filename supporttests/utils/test_file_utils.py: Added comprehensive test suiteBackward Compatibility
This change is fully backward compatible. All existing code will continue to work without modification. The new parameters are optional and only affect behaviour when explicitly provided.