Skip to content

Slurm kills job, pav keeps running.  #354

Open
@CalvinDSeamons

Description

@CalvinDSeamons

This has happened to me enough times I figured I should make an issue on it.

While running a longer test with minimal output sometimes slurm will reach it's default timeout. The job will be kicked off slurm but pavilion still thinks it's running. I figure there's a way to inform Pavilion and have a 'Slurm Timeout reached' note or something in the output. See example below:

[calvin@$machine build]$ pav status
 Test statuses
---------+-----------------------+--------------+----------+--------------------------------------
 Test id | Name                  | State        | Time     | Note
---------+-----------------------+--------------+----------+--------------------------------------
 17      | truchas-prebuilt.base | RUNNING      | 13:03:45 | Currently running.
         |                       |              |          | Last updated: 13:04:28
[calvin@machine build]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

[calvin@machine build]$

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions