Skip to content

Pav Status/Result hangs, Run_Complete into BUILD_CREATED #334

Open
@CalvinDSeamons

Description

@CalvinDSeamons

I ran into a very strange bug while testing snow today. This happened on the Yellow front end where there are approximately 11,000 more tests sitting in the working_dir than the Turquoise, just wanted to mention that as that seems to be the only notable difference.

After launching my tests I ran watch pav status. Upon the loading of the status table (which only took 3-5 seconds) a few license-tests had already completed with PASS and the rest where all SCHEDULED, everything seemed fine. 20ish minutes later after everything had finished my watch pav status which updates every 3 seconds showed everything as fine, PASS. I quit out (don't ask me why) and ran just pav status as to copy the contents out into the ticket. The command hanged, as did pav result or any permutation of pav log build/run $series/$id ect. I even logged into snow from a different terminal session, loaded pavilion/2.0 and could not access the test run. Upon using thecat command i received the following status file from one of the tests that I had observed passing:

2020-10-21T11:21:08.484894 CREATED Created status file.
2020-10-21T11:21:08.485955 CREATED Test directory and status file created.
2020-10-21T11:21:08.490230 BUILD_CREATED Builder created.
2020-10-21T11:21:08.493727 CREATED Test directory setup complete.
2020-10-21T11:21:16.778993 BUILD_REUSED Test 171aceb2e5e39623 run 13242 reusing build.
2020-10-21T11:21:23.182369 SCHEDULED Test slurm has job ID 3814891.
2020-10-21T11:47:35.252408 PREPPING_RUN Converting run template into run script.
2020-10-21T11:47:35.255769 RUNNING Starting the run script.
2020-10-21T11:47:35.261001 RUNNING Currently running.
2020-10-21T11:47:35.282176 RUN_DONE Test run has completed.
2020-10-21T11:47:35.289546 RESULTS Parsing 6 result types.
2020-10-21T11:47:35.292427 RESULTS Performing 0 result evaluations.
2020-10-21T11:47:35.308790 COMPLETE The test completed with result: PASS
2020-10-21T12:02:03.342975 BUILD_CREATED Builder created.

The PASS is what I observed inside watch pav status. When I exited watch pav status the test status changed to BUILD_CREATED and was unreachable from pav status.

I thought I'd make a note of it as @kjeverson could also not access anything through pav status. I was able to fix this by using scancel -u $user; pav cancel --all; module unload and reran my test. To whomever wants to investigate this further s377 still hangs when called and can be poked at in the yellow.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions