Fix healthcheck failing silently with --transient-store#28498
Conversation
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
2 similar comments
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
Luap99
left a comment
There was a problem hiding this comment.
mhh, while correct If I look at this aren't most argument wrong if we look further, i.e. see CreateExitCommandArgs() for a proper list of argument we must pass through.
So I think it would be worth the effort to consolidate that further and share the same code
Sure |
|
I agree with the comment on consolidation. For what it's worth, the test LGTM, though it sucks to add more waits taking up time in the tests. |
6788638 to
efb2e0d
Compare
| run_podman --transient-store inspect $ctr --format "{{.State.Health.Status}} {{.State.Health.FailingStreak}}" | ||
| assert "$output" == "healthy 0" "health status and failing streak" | ||
|
|
||
| run_podman --transient-store rm -f -t0 $ctr |
There was a problem hiding this comment.
just one problem I noticed, if the test case fails then the container is leaked and the regular teardown has no way to know it exists due the --transient-store option.
I don't really want to add --transient-store handling to the general teardown as this would slow things a lot down if we would have to double all commands there...
I guess the best I can think of would be move this to a 221-healthcheck-transient.bats and define a custom teardown there that has the right --transient-store call to remove the cotnainer even on errors
The test is flaking a lot too so this cannot merged like that. The image of course has the file so I suspect it is a race condition around the mount point somehow? Maybe best if is in a extra file to not have this marked as parallel safe |
The systemd timer created for health checks did not pass global podman flags to the subprocess, causing it to use default storage settings instead of matching the parent process. This is most visible with --transient-store, where the healthcheck looks up the container in the default store instead of the volatile one. Extract GlobalPodmanArgs() from CreateExitCommandArgs so both the exit command and healthcheck timer share the same set of global flags (--root, --runroot, --transient-store, --storage-driver, etc.). Fixes: podman-container-tools#28483 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
efb2e0d to
9598b30
Compare
It seems that it helped. Thanks, this would take me some time. I restarted only failed Docker-py Compat. job. Packit f44 jobs seem to be unrelated. |
|
PTAL @containers/podman-maintainers |
|
LGTM |
The systemd timer created for health checks did not pass
--transient-storeto the podman subprocess, causing it to look up the container in the default store instead of the volatile one.Fixes: #28483
Checklist
Ensure you have completed the following checklist for your pull request to be reviewed:
commits. (
git commit -s). (If needed, usegit commit -s --amend). The author email must matchthe sign-off email address. See CONTRIBUTING.md
for more information.
Fixes: #00000in commit message (if applicable)make validatepr(format/lint checks)Noneif no user-facing changes)Does this PR introduce a user-facing change?