-
Notifications
You must be signed in to change notification settings - Fork 1
Add download benchmarks using DANDI api #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
|
Looking back at our initial figure sketches, should these tests also include time to open + slice data after downloading? Or is that timing negligible at that point. Edit: This would be covered by #163 so probably do not need to also include here. |
If the benchmarks can re-use the already downloaded file, then timing the file open and slicing would be fair, as per #163 can be added in this PR or in a follow up |
|
(both of these are really more of a 'calibration' of average system bandwidth + disk) |
Agreed. Timing slicing from local files can be separate and use already downloaded files. |
| download(urls=params["https_url"], output_dir=self.tmpdir.name) | ||
|
|
||
|
|
||
| class LindiDownloadFsspecBenchmark(BaseBenchmark): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class LindiDownloadFsspecBenchmark(BaseBenchmark): | |
| class LindiDownloadBenchmark(BaseBenchmark): |
Should this be just LindiDownloadBenchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, it uses download_file which uses fsspec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency and maintainability, I think we should just use the DANDI API to download the LINDI files. I'll also update the download_file usage in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users would most likely either be using the dandi api/cli or their browser to download these files, not fsspec anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to keep consistent - I will update here to use the dandi API across all file types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use dandi api here but left the download_file definition as it was for now.
The lindi files don't take very long to download, does it make sense to remove the skip decorator in this case or is that more confusing/inconsistent? We discussed adding these lindi download times in our benchmark plots which is another reason to not skip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lindi files don't take very long to download, does it make sense to remove the skip decorator in this case
For lindi I think it makes sense to always run the download test
|
Looks good! |
Adds benchmarks to get download times for HDF5 and Zarr NWB files to compare against slicing extrapolations. To be used when deciding whether to download vs. stream an NWB file depending on the amount of data being accessed.
We had discussed making these either in a separate benchmarks folder or adding an environment variable. I thought setting the environment variable with the skip decorator was cleaner since there are other download related benchmarks and only two full file download tests. The download benchmarks can then be run manually with:
RUN_DOWNLOAD_BENCHMARKS=true nwb_benchmarks run --bench "time_download.HDF5DownloadDandiAPIBenchmark.time_download_hdf5_dandi_api"