Skip to content

Avoiding SSL errors that cause missing timesteps #468

@rafa-guedes

Description

@rafa-guedes

I have been regularly having SSL errors such as the one below when accessing multiple lead times using FastHerbie:

2025-09-10 02:36:31,882 [ERROR] herbie.fast: Exception has occured : HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /global-forecast-system/gfs.20250909/18/atmos/gfs.t18z.pgrb2.0p25.f011 (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))

This type of error does not raise any exception, and results in missing timesteps in the downloaded data - when defining an xarray dataset using the fh.to_xarray() method, for instance, the resulting dataset will unexpectedly have missing timesteps. For some reason, I haven't identified errors like this when running it from my local machine, but it happens relatively often for me when running from the Google Cloud infrastructure.

I have implemented some changes in a fork to the download method of the Herbie class to try to prevent this type of error. It replaces the urllib.request.urlretrieve call by a new _download method that uses the requests library, and implements some retry strategy to make the downloading more robust.

If you think this is a good idea, I can submit a pull request for you to evaluate and merge if you find it useful. I have done a few tests and it does seem to improve things for me.

Thanks,
Rafael

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions