Skip to content

Fix healthcheck failing silently with --transient-store#28498

Merged
Luap99 merged 1 commit into
podman-container-tools:mainfrom
Honny1:hc-transient-store
Apr 15, 2026
Merged

Fix healthcheck failing silently with --transient-store#28498
Luap99 merged 1 commit into
podman-container-tools:mainfrom
Honny1:hc-transient-store

Conversation

@Honny1

@Honny1 Honny1 commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

The systemd timer created for health checks did not pass --transient-store to the podman subprocess, causing it to look up the container in the default store instead of the volatile one.

Fixes: #28483

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

Fixed health checks silently failing for containers started with `--transient-store`

@packit-as-a-service

Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

2 similar comments
@packit-as-a-service

Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@packit-as-a-service

Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@Honny1 Honny1 marked this pull request as ready for review April 13, 2026 16:41

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhh, while correct If I look at this aren't most argument wrong if we look further, i.e. see CreateExitCommandArgs() for a proper list of argument we must pass through.

So I think it would be worth the effort to consolidate that further and share the same code

@Honny1

Honny1 commented Apr 13, 2026

Copy link
Copy Markdown
Contributor Author

mhh, while correct If I look at this aren't most argument wrong if we look further, i.e. see CreateExitCommandArgs() for a proper list of argument we must pass through.

So I think it would be worth the effort to consolidate that further and share the same code

Sure

@mheon

mheon commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

I agree with the comment on consolidation. For what it's worth, the test LGTM, though it sucks to add more waits taking up time in the tests.

@Honny1 Honny1 force-pushed the hc-transient-store branch from 6788638 to efb2e0d Compare April 14, 2026 09:12
Comment thread test/system/220-healthcheck.bats Outdated
run_podman --transient-store inspect $ctr --format "{{.State.Health.Status}} {{.State.Health.FailingStreak}}"
assert "$output" == "healthy 0" "health status and failing streak"

run_podman --transient-store rm -f -t0 $ctr

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one problem I noticed, if the test case fails then the container is leaked and the regular teardown has no way to know it exists due the --transient-store option.

I don't really want to add --transient-store handling to the general teardown as this would slow things a lot down if we would have to double all commands there...

I guess the best I can think of would be move this to a 221-healthcheck-transient.bats and define a custom teardown there that has the right --transient-store call to remove the cotnainer even on errors

@Luap99

Luap99 commented Apr 14, 2026

Copy link
Copy Markdown
Member
not ok 222 |220| podman healthcheck --transient-store in 974ms
         # tags: ci:parallel
         # (from function `bail-now' in file test/system/[helpers.bash, line 230](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L230),
         #  from function `die' in file test/system/[helpers.bash, line 967](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L967),
         #  from function `run_podman' in file test/system/[helpers.bash, line 608](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L608),
         #  in test file test/system/[220-healthcheck.bats, line 526](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/220-healthcheck.bats#L526))
         #   `run_podman run -d --name $ctr --transient-store \' failed
         #
<+     > # # podman  run -d --name c-h-t222-ksrgjc1q --transient-store --health-cmd /home/podman/healthcheck --health-interval 1s --health-retries 3 quay.io/libpod/testimage:20241011 /home/podman/pause
<+747ms> # Error: crun: executable file `/home/podman/pause` not found: No such file or directory: OCI runtime attempted to invoke a command that was not found
<+005ms> # [ rc=127 (** EXPECTED 0 **) ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: exit code is 127; expected 0
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         # # [teardown]

The test is flaking a lot too so this cannot merged like that. The image of course has the file so I suspect it is a race condition around the mount point somehow? Maybe best if is in a extra file to not have this marked as parallel safe

The systemd timer created for health checks did not pass global
podman flags to the subprocess, causing it to use default storage
settings instead of matching the parent process. This is most
visible with --transient-store, where the healthcheck looks up
the container in the default store instead of the volatile one.

Extract GlobalPodmanArgs() from CreateExitCommandArgs so both the
exit command and healthcheck timer share the same set of global
flags (--root, --runroot, --transient-store, --storage-driver, etc.).

Fixes: podman-container-tools#28483

Signed-off-by: Jan Rodák <hony.com@seznam.cz>
@Honny1 Honny1 force-pushed the hc-transient-store branch from efb2e0d to 9598b30 Compare April 14, 2026 12:15

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Honny1

Honny1 commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author
not ok 222 |220| podman healthcheck --transient-store in 974ms
         # tags: ci:parallel
         # (from function `bail-now' in file test/system/[helpers.bash, line 230](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L230),
         #  from function `die' in file test/system/[helpers.bash, line 967](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L967),
         #  from function `run_podman' in file test/system/[helpers.bash, line 608](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/helpers.bash#L608),
         #  in test file test/system/[220-healthcheck.bats, line 526](https://github.com/containers/podman/blob/efb2e0d9b521033b0f5d468245a6d7ef8238d76a/test/system/220-healthcheck.bats#L526))
         #   `run_podman run -d --name $ctr --transient-store \' failed
         #
<+     > # # podman  run -d --name c-h-t222-ksrgjc1q --transient-store --health-cmd /home/podman/healthcheck --health-interval 1s --health-retries 3 quay.io/libpod/testimage:20241011 /home/podman/pause
<+747ms> # Error: crun: executable file `/home/podman/pause` not found: No such file or directory: OCI runtime attempted to invoke a command that was not found
<+005ms> # [ rc=127 (** EXPECTED 0 **) ]
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: exit code is 127; expected 0
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         # # [teardown]

The test is flaking a lot too so this cannot merged like that. The image of course has the file so I suspect it is a race condition around the mount point somehow? Maybe best if is in a extra file to not have this marked as parallel safe

It seems that it helped. Thanks, this would take me some time. I restarted only failed Docker-py Compat. job. Packit f44 jobs seem to be unrelated.

@Honny1

Honny1 commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

PTAL @containers/podman-maintainers

@ashley-cui

Copy link
Copy Markdown
Contributor

LGTM

@Luap99 Luap99 merged commit e4776a2 into podman-container-tools:main Apr 15, 2026
80 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Health checks fail silently for services in transient store

4 participants