Skip to content

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Oct 14, 2025

When implementing signal container process enforcement policy we
introduced a bug, where instead of signalling just the container
init process we ended up sending signals (SIGTERM or SIGKILL) to
all processes running inside a container (by invoking runc kill --all).

This results in an unpleasant behavior, where the init process
could be handling (e.g. ignoring) SIGTERM, where as other processes
inside container don't.

This PR makes a change to the order in which the signal container
policy is enforced:

  • always call EnforceSignalContainerProcessPolicy before sending
    any signals. Otherwise, this looks like a bug, since we would
    never call EnforceSignalContainerProcessPolicy with
    signalingInitProcess == true for SIGTERM and SIGKILL and
    potentially bypassing policies, which do not allow SIGTERM or
    SIGKILL to be sent to the init process.
  • no longer call ShutdownContainer and instead revert back to
    calling process.Kill.

@anmaxvl anmaxvl requested a review from a team as a code owner October 14, 2025 05:24
@anmaxvl anmaxvl marked this pull request as draft October 14, 2025 06:32
@anmaxvl anmaxvl marked this pull request as ready for review October 14, 2025 07:21
signal_ok(signals) {
count(signals) == 0
input.isInitProcess
input.signal in [9, 15]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is up for debate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course 😄 it's just when running tests, I realized that we don't have any "default" kill signals, and my change broke stop container/pod.

Copy link
Member

@micromaomao micromaomao Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a api_version based check - if policy is of an old version, we do this default allow (if it's a sigterm/sigkill - based on old behaviour), otherwise for a new policy we expect the tooling to put in the correct allowed signals list.

return err
}

// Don't allow signalProcessV2 to route around container shutdown policy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something, but I don't think this path gates anything. The only difference EnforceShutdownContainerPolicy passing or failing will make is an error message. If it passes it will not do a shutdown and if it fails it will not do a shutdown.

I will claim that if the signal is allowed by the user and that signal is going to cause a shutdown, then they are good with being shutdown. Maybe we could treat SIGKILL as shutdown. TBD with some experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. the original code didn't quite make sense to me either, since the shutdown policy doesn't enforce much either. it just removes the container from metadata store? Are you suggesting to just not call to shutdown policy here? and simply return on L695?

@micromaomao
Copy link
Member

micromaomao commented Oct 14, 2025

Tested with signals_test.cpp - now a child process of the init process no longer gets a SIGTERM so this fix is good:

2025-10-14T15:56:35.3043696Z stdout F Parent started waiting for signal
2025-10-14T15:56:36.6938875Z stdout F Parent received signal SIGTERM (1th signal)
2025-10-14T15:56:36.6938875Z stdout F Waiting 2 seconds
2025-10-14T15:56:38.6959265Z stdout F Notifying child to exit via pipe
2025-10-14T15:56:38.6959265Z stdout F Child received exit notification. Waiting for 3 seconds.
2025-10-14T15:56:41.7000602Z stdout F Child exiting now.
// the child never gets a SIGTERM
2025-10-14T15:56:41.7030807Z stdout F Parent received signal SIGCHLD (2th signal)
2025-10-14T15:56:41.7030807Z stdout F Child exited successfully

old behaviour - child gets SIGTERM and is killed:

2025-10-14T15:27:14.5671534Z stdout F Parent started waiting for signal
2025-10-14T15:27:16.5088563Z stdout F Parent received signal SIGTERM (1th signal)
2025-10-14T15:27:16.5088563Z stdout F Waiting 2 seconds
2025-10-14T15:27:16.5088563Z stdout F ERROR: Child received SIGTERM, but we expect exit requests to come via pipe from parent.
2025-10-14T15:27:18.5199773Z stdout F Notifying child to exit via pipe
2025-10-14T15:27:18.5199773Z stdout F Parent received signal SIGCHLD (2th signal)
2025-10-14T15:27:18.5199773Z stdout F ERROR: Child exited with status 1

@anmaxvl anmaxvl force-pushed the gcs/fix-kill branch 2 times, most recently from a048664 to 8ffbc6f Compare October 15, 2025 05:49
signal_ok(signals) {
count(signals) == 0
input.isInitProcess
semver.compare(data.api.version, "0.11.0") < 0
Copy link
Member

@micromaomao micromaomao Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.api.version would always be the version shipped with the GCS I think, the policy api version would be policy_api_version

However, that still doesn't account for fragments having different version. Unfortunately by the time we evaluate the rules we wouldn't know which fragment the container came from (and thus whether it is "old" or not). I think probably the way to do this is to update check_container to, instead of using raw_container.signals straightaway, delegate it to another rule that adds the two "default" signals into the array if the framework_version (instead of api_version) is old (fragments has no api_version, only framework_version, main policy has both).

Also since we're not bumping the version in this PR this probably needs to be a <= instead of <

@matajoh do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @KenGordon and @matajoh , I've made the suggested implementation:
micromaomao@5309018
Also there is another commit to fix some error reasoning bug:
micromaomao@63f144a

I also tested that the behaviour of signaling an exec process is not changed. This can be triggered with crictl exec -s -t <timeout>:

PS C:\lcow_signals> crictl exec --timeout 1 -s 3873bfc939e24 /bin/sleep infinity
time="2025-10-15T13:52:29Z" level=fatal msg="execing command in container synchronously: rpc error: code = Unknown desc = failed to exec in container: failed to kill exec \"91d6297211fcf000d755a94cda2c7c2e7b86d46ccb0f26cf062038384f902eee\": guest RPC failure: policyDecision< {"decision":"deny","input":{"argList":["/bin/sleep","infinity"],"containerID":"3873bfc939e2415892b5b74a7b1dbade0f7222e266df43df85968ddda59be56e","isInitProcess":false,"rule":"signal_container_process","signal":9},"reason":{"errors":["target isn't allowed to receive the signal"]}} >policyDecision: unknown"

(denial message base64 decoded)

We will need to update the tooling to bump the framework_version and add the default signals to generated policies (only to the container, not for exec process - currently trying to kill an exec process is already denied unless policy allows)

@anmaxvl Please feel free to review and take these 2 changes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they make sense. feel free to author them in a separate PR. I removed the rego changes to implement the immediate fix.

@micromaomao
Copy link
Member

micromaomao commented Oct 15, 2025

With this change I think we still have two (or three?) ways to kill a (specific, individual) container - via signalProcess or killContainerV2 or shutdownContainerV2, the latter two doesn't check the signal_container_process enforcement point (only shutdown_container). Currently shutdown_container is always allowed. Do we need to fix this? @KenGordon @matajoh

(I could not find a host-side command to send kill/shutdownContainerV2 but in principle the host could send it)

micromaomao added a commit to micromaomao/hcsshim that referenced this pull request Oct 15, 2025
… in old policies

We used to allow SIGTERM/SIGKILL the container init process even if the
container's signals list is empty due to a bug fixed in microsoft#2538. However, because
our tooling has been generating policies with an empty signals list, we need to
special case this for old policies to maintain backwards compatibility.

Update framework.rego to have SIGTERM and SIGKILL as default kill signals for
init process for framework API versions "0.4.1" and below.  Newer policies must
explicitly have these signals present, otherwise sending signal will be denied.

Signed-off-by: Tingmao Wang <[email protected]>
Co-authored-by: Maksim An <[email protected]>
@KenGordon KenGordon self-requested a review October 16, 2025 14:30
Copy link
Collaborator

@KenGordon KenGordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good now for the non confidential case. Tingmao has a PR for the confidential case coming.

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove the "all" functionality from Kill it is not supported in runc and we should not support it either.

When implementing signal container process enforcement policy we
introduced a bug, where instead of signalling just the container
init process we ended up sending signals (SIGTERM or SIGKILL) to
all processes running inside a container (by invoking `runc kill --all`).

`container.Kill` no longer sends signals to all container processes.

This results in an unpleasant behavior, where the init process
could be handling (e.g. ignoring) SIGTERM, where as other processes
inside container don't.

This PR makes a change to the order in which the signal container
policy is enforced:
  - always call `EnforceSignalContainerProcessPolicy` before sending
    any signals. Otherwise, this looks like a bug, since we would
    never call `EnforceSignalContainerProcessPolicy` with
    `signalingInitProcess == true` for `SIGTERM` and `SIGKILL` and
    potentially bypassing policies, which do not allow `SIGTERM` or
    `SIGKILL` to be sent to the init process.
  - no longer call `ShutdownContainer` and instead revert back to
    calling `process.Kill`.

Signed-off-by: Maksim An <[email protected]>
Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke offline. This change is ok for now. We should do a follow up change where we modify the Kill opts on the Task/Cow interfaces to forward the all intent to the guest over the bridge and use that to determine setting the --all flag in runc case.

@anmaxvl anmaxvl merged commit 04735e0 into microsoft:main Oct 21, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants