Skip to content

ASB-30568: Adding read_product Function to load files from s3 to memory#3561

Open
AlexReedy wants to merge 14 commits into
astropy:mainfrom
AlexReedy:ASB-30568_read-product-function
Open

ASB-30568: Adding read_product Function to load files from s3 to memory#3561
AlexReedy wants to merge 14 commits into
astropy:mainfrom
AlexReedy:ASB-30568_read-product-function

Conversation

@AlexReedy
Copy link
Copy Markdown

@AlexReedy AlexReedy commented Mar 20, 2026

Adding in ability to read FITS and ASDF data products to memory from s3:// using Observations.read_product() function

@bsipocz bsipocz added the mast label Apr 4, 2026
@snbianco snbianco self-requested a review April 9, 2026 14:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.24%. Comparing base (e3702c4) to head (3508915).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
astroquery/mast/observations.py 84.61% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3561      +/-   ##
==========================================
+ Coverage   73.22%   73.24%   +0.01%     
==========================================
  Files         226      226              
  Lines       21000    21026      +26     
==========================================
+ Hits        15378    15400      +22     
- Misses       5622     5626       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AlexReedy AlexReedy marked this pull request as ready for review April 17, 2026 16:57
Copy link
Copy Markdown
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit history is a bit all over the place. Could you please clean it up and squash to logical chucks rather than quasi random back and forth? Thanks!

I suppose this will also need narrative documentation; but I'll leave the more detailed review to @snbianco

Comment thread astroquery/mast/observations.py Outdated
except Exception as e:
log.exception(f"Failed to open ASD File: {product_path} {e}")
else:
print("Unsupported extension type")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't print anything; either raise proper warning or error classes or put this behind a verbose option.

Copy link
Copy Markdown
Author

@AlexReedy AlexReedy Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you! I thought I got rid of all of those

Comment thread astroquery/mast/observations.py Outdated
'`~astroquery.mast.ObservationsClass.enable_cloud_dataset` method.'
)

asdf_packages = ["asdf", "s3fs", "fsspec", "lz4", "gwcs"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does all of these really need to be checked here even as they are not directly been used? If asdf requires the whole list then these checks should be dealt with upstream in asdf itself.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been an ongoing question with the implementation. Required would be asdf, s3fs, fsspec for the function to work. for asdf specifically, lz4 seems to be the primary compression algorithm being used for asdf and gwcs is for the general.

I know that gwcs is in their test environment, and they also make calls to lz4 but I don't believe it's included when it's installed. I agree it should probably be upstream in asdf.

@AlexReedy AlexReedy force-pushed the ASB-30568_read-product-function branch from 12ab349 to eb5a1b2 Compare April 28, 2026 17:08
@snbianco
Copy link
Copy Markdown
Contributor

Thanks again for this PR, Alex! Can you add a quick section about this function to the Observations docs (docs/mast/mast_obsquery.rst). You'll probably want to put it in the cloud data access section.

This PR will also need some tests. The non-remote-access tests may be slightly tricky with mocking. Let me look into it and see if I can point you in the right direction. The remote-access tests should be more straightforward.

Copy link
Copy Markdown
Contributor

@snbianco snbianco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of review! This is looking good. A lot of my comments were questions or notes on the docs/docstrings. Let me know if you have questions or want to chat more about the switch to fsspec!

assert Observations._cloud_enabled_explicitly is False


@pytest.fixture
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention we have for fixtures in this file is to put them near the top, before any of the tests. For this one and the s3_asdf_path specifically, though, I'm not sure if we need a fixture, since they're only being used once. I'd rather see the value of s3_fits_path be parametrized.



def test_read_product_fits(s3_fits_path, mock_fits_open, mocker):
mocker.patch("astropy.__version__", "5.0.0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this line?

def test_observations_read_product_asdf(self):
asdf = pytest.importorskip("asdf")

product_path = "s3://stpubdata/roman/nexus/soc_simulations/tutorial_data" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any smaller files that we have available to us from the Nexus? This takes around 15 seconds to load in, and we try to keep this test suite as short in duration as we can.

Parameters
----------
product_path: str
URI to the product in open bucket.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
URI to the product in open bucket.
URI to the product in the STScI S3 open data bucket.

product_path: str
URI to the product in open bucket.
read_as: str, optional
How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
How to read the file. Currently only .fits and .asdf is supported by "auto". Defaults to "auto".
How to read the file. Currently only FITS and ASDF file types are supported by "auto". Default is "auto".

Streaming Data Products from S3 to memory
-----------------------------------------
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``.
This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or `~asdf.open`.


Streaming Data Products from S3 to memory
-----------------------------------------
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
If instead of downloading you would like to load an S3 URI directly to memory, you can use the `~astroquery.mast.ObservationsClass.read_product` method.

-----------------------------------------
If instead of downloading you would like to load an S3 URI directly to memory you can use `~astroquery.mast.ObservationsClass.read_product`.
This function supports FITS and ASDF data products and will automatically parse the file for the suffix and load it to memory using `~astropy.io.fits.open` or ``~asdf.open``.
For ASDF data products additional packages may be required (e.g lz4 and roman-datamodels for ROMAN data).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For ASDF data products additional packages may be required (e.g lz4 and roman-datamodels for ROMAN data).
For ASDF data products, additional packages may be required (e.g `~lz4` and `~roman-datamodels` for data from the Roman Space Telescope).

Comment thread CHANGES.rst
- The cloud dataset in ``Observations`` is now enabled by default if the ``boto3`` and ``botocore`` packages are installed. This
default can be overridden by setting the ``enable_cloud_dataset`` configuration option to False. [#3534]

- Adding in ability to read FITS and ASDF dataproducts to memory from s3:// using ``Observations.read_product()`` function. [#3561]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Adding in ability to read FITS and ASDF dataproducts to memory from s3:// using ``Observations.read_product()`` function. [#3561]
- Adding in ability to read FITS and ASDF data products to memory from STScI's S3 open data bucket using ``Observations.read_product()`` function. [#3561]

Comment thread pyproject.toml
# Temp workaround for https://github.com/RKrahl/pytest-dependency/issues/91
"pytest-dependency; platform_system != 'Windows'",
"pytest-rerunfailures",
"fsspec[http]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"fsspec[http,s3]",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants