[GSoC 2025] Load model JSON files from URLs#5137
[GSoC 2025] Load model JSON files from URLs#5137medha-14 wants to merge 12 commits intopybamm-team:mainfrom
Conversation
|
I've updated the Please let me know if any part of this approach needs changing or improvisation. |
|
@medha-14 can you change the base branch here? Would be much easier to review |
src/pybamm/dispatch/entry_points.py
Outdated
| def get_cache_path(url): | ||
| cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
| cache_dir.mkdir(parents=True, exist_ok=True) | ||
| file_hash = hashlib.md5(url.encode()).hexdigest() | ||
| return cache_dir / f"{file_hash}.json" | ||
|
|
||
|
|
||
| def clear_model_cache(): | ||
| cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
| if cache_dir.exists(): | ||
| for file in cache_dir.glob("*.json"): | ||
| file.unlink() |
There was a problem hiding this comment.
This should be handled using platformdirs.user_cache_dir: https://platformdirs.readthedocs.io/en/latest/api.html#cache-directory
Hi @medha-14, could you please create a PR within your fork from this link? medha-14/PyBaMM@GSoC...medha-14:PyBaMM:load_json_from_url. I see a better diff there. We could add comments on that PR until #5056 is merged. Otherwise, I think it is on the right track, modulo that we need to use |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5137 +/- ##
===========================================
- Coverage 98.88% 98.86% -0.03%
===========================================
Files 320 320
Lines 26949 26988 +39
===========================================
+ Hits 26648 26681 +33
- Misses 301 307 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| from pybamm.expression_tree.operations.serialise import Serialise | ||
|
|
||
| APP_NAME = "pybamm" | ||
| APP_AUTHOR = "pybamm" |
There was a problem hiding this comment.
I think we can keep just APP_NAME and drop APP_AUTHOR, and we can set Path(user_cache_dir(APP_NAME)) / "models" as a constant here instead.
| def get_cache_path(url: str) -> Path: | ||
| cache_dir = _get_cache_dir() | ||
| file_hash = hashlib.md5(url.encode()).hexdigest() | ||
| return cache_dir / f"{file_hash}.json" |
There was a problem hiding this comment.
Why MD5 and not SHA-256? :)
| try: | ||
| file.unlink() | ||
| except Exception as e: | ||
| # Optional: log error instead of failing silently |
There was a problem hiding this comment.
| # Optional: log error instead of failing silently |
| def clear_model_cache() -> None: | ||
| cache_dir = _get_cache_dir() | ||
| for file in cache_dir.glob("*.json"): | ||
| try: | ||
| file.unlink() | ||
| except Exception as e: |
There was a problem hiding this comment.
Note, we also store some of PyBaMM's data files in the cache dir using a pooch registry:
PyBaMM/src/pybamm/pybamm_data.py
Lines 126 to 139 in a1aa02c
I wonder if we could reuse some of the code here, because clearing the cache directory of JSON files could have unintended side effects. We do have JSON files there: https://github.com/pybamm-team/pybamm-data/releases/tag/v1.0.1
Could we perhaps rewrite the PR to use pooch instead? That will also provide safer defaults than using urllib.request.urlretrieve() directly, and we could rely on it for the checksum/caching bits to see if we need to download the model JSON again or not.
| def Model( | ||
| model=None, | ||
| url=None, | ||
| force_download=False, | ||
| *args, | ||
| **kwargs, | ||
| ): |
There was a problem hiding this comment.
I think we can have better typing here.
Description
This PR is based on #5056 and contains changes that build upon it.
Once #5056 is merged, the diff for this PR will automatically update to show only the relevant changes.
Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #)
Important checks:
Please confirm the following before marking the PR as ready for review:
nox -s pre-commitnox -s testsnox -s doctests