Skip to content

Conversation

@kousu
Copy link

@kousu kousu commented Apr 6, 2021

Copy link
Member

@jcohenadad jcohenadad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:

spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz 

so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

@kousu
Copy link
Author

kousu commented Apr 6, 2021

i am not sure if i understand everything correctly, but i notice file name changed for e.g.:

spinalcordtoolbox/data/PAM50/atlas/PAM50_atlas_00.nii.gz 

so what happens if, e.g. someone has SCT installed under, e.g.: sct-v3.4.4 instead of spinalcordtoolbox. Will it still work? I suppose we are talking about the hierarchy of the python package, and there, it will always be spinalcordtoolbox, right?

Instead of just saying `sct/data/PAM50/atlas/PAM50_atlas_00.nii.gz' we'll use

import spinalcordtoolbox.data.PAM50

...

importlib.resources.path(spinalcordtoolbox.data.PAM50, 'atlas/PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

This also works:

import spinalcordtoolbox.data.PAM50.atlas

...

importlib.resources.path(spinalcordtoolbox.data.PAM50.atlas, 'PAM50_atlas_00.nii.gz').__enter__() # maybe a helper to shorten this...

The advantage of this is:

  1. You get an ImportError if the data isn't installed, which should be a lot clearer where the problem is
  2. pip keeps a cache in ~/.cache/pip of previous versions of packages, so reinstalling doesn't require redownloading. And this is safe because it keys them by hash and version string, so it can't be fooled. If the new version of SCT keeps using the old version of the data, no one needs to redownload it.
    2. and we can exploit this to use CI caching to save a lot of bandwidth on testing
  3. The data is contained under site-packages/ (so, currently, $SCT_DIR/python/envs/venv_sct/lib/python3.6/site-packages/spinalcordtoolbox/data) so it can't get muddled up with a different install.
  4. In principle we can use RECORD to integrity check the datasets -- or even the entire spinalcordtoolbox -- at any time, even after it's installed.
    • This sounds like a good thing to build into sct_check_dependencies

The disadvantage is:

  1. You can't just go look at the data like you can when it's in your working directory, so pip isn't great for courseware like course_beijing or sct-example-data. I don't know what we should do about those. My plan for the next, like, 72 hours is to just leave them alone and keep them in sct_download_data. Maybe that stuff should just be curl | tar? Maybe we can talk about that today?

@joshuacwnewton joshuacwnewton marked this pull request as draft April 14, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants