Fix WithContainerStepTest.stop and WithContainerStepTest.death#257
Fix WithContainerStepTest.stop and WithContainerStepTest.death#257dwnusbaum merged 2 commits intojenkinsci:masterfrom
Conversation
src/test/java/org/jenkinsci/plugins/docker/workflow/WithContainerStepTest.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java
Outdated
Show resolved
Hide resolved
|
Let's rebuild and hope that |
|
It looks like the build might fail to record Linux test results, but |
| " withDockerContainer('httpd:2.4.12') {\n" + | ||
| " sh \"sleep 5; ps -e -o pid,command | egrep '${pwd tmp: true}/durable-.+/script.sh' | fgrep -v grep | sort -n | tr -s ' ' | cut -d ' ' -f2 | xargs kill -9\"\n" + | ||
| " withDockerContainer('httpd:2.4.54-alpine') {\n" + | ||
| " sh \"set -e; sleep 5; ps -e -o pid,args | egrep '${pwd tmp: true}/durable-.+/script.sh' | fgrep -v grep | sort -n | tr -s ' ' | cut -d ' ' -f2 | xargs kill -9\"\n" + |
There was a problem hiding this comment.
doing this whole parsing looks so painful to me 😢
Would it be easier to test if the heartbeat file still existed/was being written to?
There was a problem hiding this comment.
I think this test is trying to check whether sh steps inside of withContainer respond correctly if the remote script just suddenly dies, so it is using internal knowledge of durable-task to kill all of the processes related to the sh step from within, which is pretty awkward. It used to be simpler before jenkinsci/durable-task-plugin#49, but it had to be updated in #121 since the pid file didn't exist anymore.
I think that if we deleted the heartbeat file here it wouldn't do anything, since the control script would just touch the file 3 seconds later, recreating it. Not 100% sure though. Maybe we could chmod it to be unwritable or something, but IDK if that would be much easier to maintain.
I saw these flake in #256 and didn't think much of it, but then I saw them also fail in #253, and took a look. There seem to be a lot of problems with our use of
ps.The real head scratcher for me though is that the tests pass locally with
httpd:2.4.12. They only fail if I usehttpd:2.4.35or newer (which I believe is due to the switch to a-slimversion of Debian in docker-library/httpd#113 which removespsfrom the container entirely, but it is hard to track things down because that repo does not have tags/releases). If I do update to 2.4.35 or newer then I get the exact error messages seen in CI. Is something in the CI environment forcing newer images to be used? Is that even possible?