Skip to content

Forecast jobs sometimes failing on reading ocean_geometry file #4490

@shlyaeva

Description

@shlyaeva

What is wrong?

I, Laura Slivinski and @JohnSteffen-NOAA have noticed that sometimes forecast jobs fail with:

204: FATAL from PE 0: Permission denied: netcdf_file_open:MOM6_OUTPUT/ocean_geometry.nc

Here is a specific example of my run where it happened. Experiment yaml is based on dev/ci/cases/gfsv17/C384mx025_hybAOWCDA.yaml.

  • job enkfgdas_fcst_mem028 failed on the first attempt due to running out of walltime
  • second attempt: job fails with the Permission denied: netcdf_file_open:MOM6_OUTPUT/ocean_geometry.nc message.
  • manual reboot of the job: job fails with the Permission denied: netcdf_file_open:MOM6_OUTPUT/ocean_geometry.nc message.

Laura reported that removing the offending RUNDIR and rebooting the job after that ocean_geometry failure solved the issue for her.

I checked that RUNDIR indeed contains unreadable ocean_geometry.nc file in RUNDIRS/<expname>/enkfgdas.2025101000/enkfgdasefcs028.2025101000/output/MOM6_OUTPUT/ocean_geometry.nc and manually removed it. The job succeeded when rebooting after that.

What should have happened?

I expect the job to rerun successfully without the need to manually remove files from run directories

What machines are impacted?

All or N/A

What global-workflow hash are you using?

debf5c9

Steps to reproduce

See above.

Additional information

No response

Do you have a proposed solution?

Perhaps some cleanup needs to happen when the forecast executable fails, I am not sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions