Skip to content

Conversation

@jbusecke
Copy link

@jbusecke jbusecke commented May 1, 2024

I made some modifications that allow me to simply extract the download urls for all datasets in the catalog.
Testing things on my end with this PR branch for now, but once things work there, happy to polish this out more!

Related: leap-stc/cmip6-leap-feedstock#128

@mnovovil
Copy link

mnovovil commented Feb 6, 2025

Hello, will this be implemented at the main branch at anytime?

@nocollier
Copy link
Member

Hey @mnovovil, thanks for writing. As this was marked WIP, I never really have looked carefully at what is going on. Instead, we took what was discussed with Julius at the ESGF meeting last April and are working at providing the functionality in a different way.

We do support "streaming" data. Essentially you provide to_dataset_dict() a keyword and will get out the Opendap links:

https://intake-esgf.readthedocs.io/en/latest/stream.html

You can also return just the paths instead of the xarray datasets:

https://intake-esgf.readthedocs.io/en/latest/paths.html

However, at the moment our catalogs are still only pointed at Solr and Globus (soon STAC) indices. This means that the only "streaming" links we can find are the opendap ones that are part of those records. Still on our todo list is to read in intake-esm style catalogs and then other kinds of "streaming" links could be harvested. ESGF itself may provide other forms of data access, but at the moment are swamped with the backend structural changes for CMIP7.

Hope that gives you some context. Happy to revisit this if there is something we are missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants