Skip to content

Conversation

@fusawa-yugo
Copy link
Contributor

Motivation

The package cache currently is not updated unless a user sets force_reload=True when calling load_module().

Description of the changes

I added a lifetime for the package cache.
When the lifetime expires, the package will be downloaded again.
Users can set the lifetime by setting the OPTUNAHUB_CACHE_EXPIRATION_SECONDS environment variable (default is 30 days).

@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

❌ Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.25%. Comparing base (4002fd4) to head (0169702).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
optunahub/_conf.py 66.66% 2 Missing ⚠️
optunahub/hub.py 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #108      +/-   ##
==========================================
- Coverage   81.37%   81.25%   -0.13%     
==========================================
  Files           9        9              
  Lines         204      224      +20     
==========================================
+ Hits          166      182      +16     
- Misses         38       42       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fusawa-yugo fusawa-yugo marked this pull request as draft August 15, 2025 05:21
@fusawa-yugo fusawa-yugo marked this pull request as ready for review August 15, 2025 07:06
@fusawa-yugo fusawa-yugo marked this pull request as draft August 15, 2025 07:25
@fusawa-yugo fusawa-yugo marked this pull request as ready for review August 18, 2025 07:42
@c-bata
Copy link
Member

c-bata commented Aug 25, 2025

@nabenabe0928 Could you review this PR?

@c-bata c-bata assigned gen740 and unassigned nabenabe0928 Oct 3, 2025
cache_dir_prefix = os.path.join(_conf.cache_home(), hostname, repo_owner, repo_name, ref)
package_cache_dir = os.path.join(cache_dir_prefix, dir_path)
use_cache = not force_reload and os.path.exists(package_cache_dir)
print(f"package_cache_dir: {package_cache_dir}")
Copy link
Member

@gen740 gen740 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this print statement intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think load_module should not print anything. This function just loads the package, so side effects should be minimal.

@y0z
Copy link
Member

y0z commented Oct 10, 2025

NOTE: This PR needs discussion to merge.

Copy link
Member

@gen740 gen740 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of issues with the current approach:

  • Unreliable last_modified_time
    • On macOS, Finder automatically creates .DS_Store files just by opening a directory.
    • Text editors may also update file metadata unintentionally when viewing code.
    • As a result, last_modified_time can change without any real modification.
  • Risks of using rglob
    • If a symbolic link exists inside the cache directory, rglob will follow it and traverse external files.
  • This can even cause infinite loops in certain cases.
  • rglob is also not efficient when the number of files becomes large, leading to unnecessary performance overhead.

A safer approach would be to create a dedicated file (e.g., last_update_time.txt) and explicitly store the update timestamp there.

cache_dir_prefix = os.path.join(_conf.cache_home(), hostname, repo_owner, repo_name, ref)
package_cache_dir = os.path.join(cache_dir_prefix, dir_path)
use_cache = not force_reload and os.path.exists(package_cache_dir)
print(f"package_cache_dir: {package_cache_dir}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think load_module should not print anything. This function just loads the package, so side effects should be minimal.

@y0z y0z marked this pull request as draft October 14, 2025 03:47
@c-bata
Copy link
Member

c-bata commented Oct 14, 2025

Let me unassign the reviewer since this PR was marked as draft state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants