Long runtimes and error messages in some gfs_gempak_f jobs on WCOSS2

### What is wrong?

Several `gfs_gempak_f*` jobs in the extended CI test case on WCOSS2 are consistently hitting their 30min walltimes. The automated 2nd attempt will also hit the walltime. A third attempt hours later always completes successfully. First thought was that the issue was a machine issue but other CI testing since the first instance are continuing to have the same issues in the same jobs.

Viewing the logs shows the following message repeated many times for those jobs: `Error in message send = 22`. The resulting job logs are about 15-20GB in size (!) with those error messages printed, whereas the gempak logs are usually MBs and not GBs.

The affected jobs are `gfs_gempak_f123-f144` and `gfs_gempak_f147-f168`. Do not see the error messages in any of the other `gfs_gempak_f*` jobs in the same CI tests.

Snippet from log where the error first appears:
```
+ cpfs[13]cpdstfile=/lfs/h2/emc/ptmp/emc.global/PR/PR_3626/RUNTESTS/COMROOT/C96_atm3DVar_extended_3626/gfs.20211221/06//products/atmos/gempak/35km_pac/gfs_35km_pac_2021122
106f168
Error in message send = 22
itype, ichan, nwords,2,22216705,2
Error in message send = 22
itype, ichan, nwords,2,22216705,2
Error in message send = 22
itype, ichan, nwords,2,22216705,2
...
```

See saved logs on Cactus: `/lfs/h2/emc/global/noscrub/emc.global/ci/SAVE_LOGS_3626`

### What should have happened?

No error message and job completes on time.

### What machines are impacted?

WCOSS2

### What global-workflow hash are you using?

`develop` and recent PR hashes

### Steps to reproduce

Run `develop` extended CI test case on WCOSS2 with `DO_GEMPAK=YES`.

### Additional information

_No response_

### Do you have a proposed solution?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long runtimes and error messages in some gfs_gempak_f jobs on WCOSS2 #3630

What is wrong?

What should have happened?

What machines are impacted?

What global-workflow hash are you using?

Steps to reproduce

Additional information

Do you have a proposed solution?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long runtimes and error messages in some gfs_gempak_f jobs on WCOSS2 #3630

Description

What is wrong?

What should have happened?

What machines are impacted?

What global-workflow hash are you using?

Steps to reproduce

Additional information

Do you have a proposed solution?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions