-
Notifications
You must be signed in to change notification settings - Fork 124
Description
I have been regularly having SSL errors such as the one below when accessing multiple lead times using FastHerbie:
2025-09-10 02:36:31,882 [ERROR] herbie.fast: Exception has occured : HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /global-forecast-system/gfs.20250909/18/atmos/gfs.t18z.pgrb2.0p25.f011 (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))
This type of error does not raise any exception, and results in missing timesteps in the downloaded data - when defining an xarray dataset using the fh.to_xarray() method, for instance, the resulting dataset will unexpectedly have missing timesteps. For some reason, I haven't identified errors like this when running it from my local machine, but it happens relatively often for me when running from the Google Cloud infrastructure.
I have implemented some changes in a fork to the download method of the Herbie class to try to prevent this type of error. It replaces the urllib.request.urlretrieve call by a new _download method that uses the requests library, and implements some retry strategy to make the downloading more robust.
If you think this is a good idea, I can submit a pull request for you to evaluate and merge if you find it useful. I have done a few tests and it does seem to improve things for me.
Thanks,
Rafael