-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
Milestone
Description
On supercomputers the job is killed when reaching the wall time limit, usually 24hrs.
It could be, if not lucky, that the run is killed while dumping diags in which case the whole h5/vtkhdf file can be corrupted.
A way to prevent this from happening could be:
- measure the last diag dump performed and keep that time (we're talking about all diags to be safer but it could per file if peacky)
- use last timing to compare to user time before wall time limit
- if current time to wall time limit < last timing, then do not dump (and tell the user)
- we might consider an option to propose preemptive kill if the user does not want to run something that's going to be killed before it dumps anything anymore (should be optional as users may still want to have logs or something something)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Do me 👋