ASB-30568: Adding read_product Function to load files from s3 to memory#3561
ASB-30568: Adding read_product Function to load files from s3 to memory#3561AlexReedy wants to merge 14 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3561 +/- ##
==========================================
+ Coverage 73.22% 73.24% +0.01%
==========================================
Files 226 226
Lines 21000 21026 +26
==========================================
+ Hits 15378 15400 +22
- Misses 5622 5626 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bsipocz
left a comment
There was a problem hiding this comment.
The commit history is a bit all over the place. Could you please clean it up and squash to logical chucks rather than quasi random back and forth? Thanks!
I suppose this will also need narrative documentation; but I'll leave the more detailed review to @snbianco
| except Exception as e: | ||
| log.exception(f"Failed to open ASD File: {product_path} {e}") | ||
| else: | ||
| print("Unsupported extension type") |
There was a problem hiding this comment.
please don't print anything; either raise proper warning or error classes or put this behind a verbose option.
There was a problem hiding this comment.
ah thank you! I thought I got rid of all of those
| '`~astroquery.mast.ObservationsClass.enable_cloud_dataset` method.' | ||
| ) | ||
|
|
||
| asdf_packages = ["asdf", "s3fs", "fsspec", "lz4", "gwcs"] |
There was a problem hiding this comment.
does all of these really need to be checked here even as they are not directly been used? If asdf requires the whole list then these checks should be dealt with upstream in asdf itself.
There was a problem hiding this comment.
This has been an ongoing question with the implementation. Required would be asdf, s3fs, fsspec for the function to work. for asdf specifically, lz4 seems to be the primary compression algorithm being used for asdf and gwcs is for the general.
I know that gwcs is in their test environment, and they also make calls to lz4 but I don't believe it's included when it's installed. I agree it should probably be upstream in asdf.
…fits data products from s3
12ab349 to
eb5a1b2
Compare
|
Thanks again for this PR, Alex! Can you add a quick section about this function to the This PR will also need some tests. The non-remote-access tests may be slightly tricky with mocking. Let me look into it and see if I can point you in the right direction. The remote-access tests should be more straightforward. |
…required for testing
…on logs and tests
…re file extension is no supported
snbianco
left a comment
There was a problem hiding this comment.
Another round of review! This is looking good. A lot of my comments were questions or notes on the docs/docstrings. Let me know if you have questions or want to chat more about the switch to fsspec!
| assert Observations._cloud_enabled_explicitly is False | ||
|
|
||
|
|
||
| @pytest.fixture |
There was a problem hiding this comment.
The convention we have for fixtures in this file is to put them near the top, before any of the tests. For this one and the s3_asdf_path specifically, though, I'm not sure if we need a fixture, since they're only being used once. I'd rather see the value of s3_fits_path be parametrized.
|
|
||
|
|
||
| def test_read_product_fits(s3_fits_path, mock_fits_open, mocker): | ||
| mocker.patch("astropy.__version__", "5.0.0") |
There was a problem hiding this comment.
Why do we need this line?
| def test_observations_read_product_asdf(self): | ||
| asdf = pytest.importorskip("asdf") | ||
|
|
||
| product_path = "s3://stpubdata/roman/nexus/soc_simulations/tutorial_data" \ |
There was a problem hiding this comment.
Are there any smaller files that we have available to us from the Nexus? This takes around 15 seconds to load in, and we try to keep this test suite as short in duration as we can.
| Parameters | ||
| ---------- | ||
| product_path: str | ||
| URI to the product in open bucket. |
There was a problem hiding this comment.
| URI to the product in open bucket. | |
| URI to the product in the STScI S3 open data bucket. |
| product_path: str | ||
| URI to the product in open bucket. | ||
| read_as: str, optional | ||
| How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto". |
There was a problem hiding this comment.
| How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto". | |
| How to read the file. Currently only FITS and ASDF file types are supported by "auto". Default is "auto". |
| Streaming Data Products from S3 to memory | ||
| ----------------------------------------- | ||
| If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`. | ||
| This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``. |
There was a problem hiding this comment.
| This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``. | |
| This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or `~asdf.open`. |
|
|
||
| Streaming Data Products from S3 to memory | ||
| ----------------------------------------- | ||
| If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`. |
There was a problem hiding this comment.
| If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`. | |
| If instead of downloading you would like to load an S3 URI directly to memory, you can use the `~astroquery.mast.ObservationsClass.read_product` method. |
| ----------------------------------------- | ||
| If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`. | ||
| This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``. | ||
| For ASDF data products additional packages may be required (e.g lz4 and roman-datamodels for ROMAN data). |
There was a problem hiding this comment.
| For ASDF data products additional packages may be required (e.g lz4 and roman-datamodels for ROMAN data). | |
| For ASDF data products, additional packages may be required (e.g `~lz4` and `~roman-datamodels` for data from the Roman Space Telescope). |
| - The cloud dataset in ``Observations`` is now enabled by default if the ``boto3`` and ``botocore`` packages are installed. This | ||
| default can be overridden by setting the ``enable_cloud_dataset`` configuration option to False. [#3534] | ||
|
|
||
| - Adding in ability to read FITS and ASDF dataproducts to memory from s3:// using ``Observations.read_product()`` function. [#3561] |
There was a problem hiding this comment.
| - Adding in ability to read FITS and ASDF dataproducts to memory from s3:// using ``Observations.read_product()`` function. [#3561] | |
| - Adding in ability to read FITS and ASDF data products to memory from STScI's S3 open data bucket using ``Observations.read_product()`` function. [#3561] |
| # Temp workaround for https://github.com/RKrahl/pytest-dependency/issues/91 | ||
| "pytest-dependency; platform_system != 'Windows'", | ||
| "pytest-rerunfailures", | ||
| "fsspec[http]", |
There was a problem hiding this comment.
| "fsspec[http,s3]", |
Adding in ability to read FITS and ASDF data products to memory from s3:// using Observations.read_product() function