Fix thick plugin STATUS and GC handling for plugin-level commands#1480
Fix thick plugin STATUS and GC handling for plugin-level commands#1480jkary wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where the Multus thick plugin daemon failed to correctly process CNI STATUS and GC commands due to an erroneous requirement for pod-specific context. By aligning with the CNI 1.1.0 specification, these commands are now properly recognized as plugin-level operations, resolving failures encountered by kubelet. Additionally, the daemon now incorporates a graceful shutdown mechanism, enhancing its resilience and operational stability. The changes also include updates to build processes and CNI version handling to improve compatibility and maintainability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue where the thick plugin daemon required pod context for plugin-level CNI commands like STATUS and GC, and also includes changes for graceful shutdown. However, it introduces critical security vulnerabilities in the new entrypoint script for the Multus CNI container. Specifically, the script is vulnerable to JSON injection via the CNI_VERSION parameter, path traversal leading to arbitrary file renames on the host via the MASTER_PLUGIN parameter, and potential configuration corruption due to insecure shell scripting and improper JSON escaping. These issues are particularly critical given that the Multus container typically runs with high privileges and host filesystem access. It is recommended to use JSON-aware tools for configuration generation and to strictly validate all command-line inputs to address these security concerns. Additionally, a bug was identified in the new STATUS handling logic, and there are opportunities for cleanup in the Dockerfiles and shell scripts.
images/entrypoint.sh
Outdated
|
|
||
| MASTER_PLUGIN_LOCATION=$MULTUS_AUTOCONF_DIR/$MASTER_PLUGIN | ||
| if [ "$FORCE_CNI_VERSION" == true ]; then | ||
| MASTER_PLUGIN_JSON="$(cat $MASTER_PLUGIN_LOCATION | sed -e "s/\"cniVersion.*/\"cniVersion\": \"$CNI_VERSION\",/g")" |
There was a problem hiding this comment.
The CNI_VERSION variable, which is populated from the --cni-version command-line argument, is used directly in a sed command without any sanitization or validation. An attacker who can control the arguments to the script can inject arbitrary JSON keys and values into the generated CNI configuration. This could be used to inject a malicious CNI plugin or alter the behavior of existing ones, potentially leading to a compromise of the node's networking stack.
Recommendation: Validate that CNI_VERSION matches an expected version format (e.g., ^[0-9]+\.[0-9]+\.[0-9]+$) before using it in the sed command. Alternatively, use a JSON-aware tool like jq or a Python script to modify the JSON content safely.
images/entrypoint.sh
Outdated
|
|
||
| # If we're not performing the cleanup on exit, we can safely rename the config file. | ||
| if [ "$RENAME_SOURCE_CONFIG_FILE" == true ]; then | ||
| mv ${MULTUS_AUTOCONF_DIR}/${MASTER_PLUGIN} ${MULTUS_AUTOCONF_DIR}/${MASTER_PLUGIN}.old |
There was a problem hiding this comment.
The MASTER_PLUGIN variable, which can be controlled via the --multus-master-cni-file-name command-line argument, is used to construct a file path for a mv command without validation. Since the container typically runs as privileged with host mounts, an attacker could use path traversal sequences (e.g., ../../etc/shadow) to rename arbitrary files on the host filesystem, potentially leading to a full system compromise or Denial of Service.
Recommendation: Validate that MASTER_PLUGIN is a simple filename and does not contain any path traversal characters like / or ...
| if gt, _ := cniversion.GreaterThanOrEqualTo(confList.CNIVersion, "1.1.0"); !gt { | ||
| logging.Debugf("conflistStatus: skipping STATUS for network list %q cniVersion %q (< 1.1.0)", confList.Name, confList.CNIVersion) | ||
| } |
There was a problem hiding this comment.
This if block logs that it is skipping the STATUS call for older CNI versions, but it doesn't actually skip the execution. It should return nil to prevent cniNet.GetStatusNetworkList from being called, similar to how it's handled in confStatus.
| if gt, _ := cniversion.GreaterThanOrEqualTo(confList.CNIVersion, "1.1.0"); !gt { | |
| logging.Debugf("conflistStatus: skipping STATUS for network list %q cniVersion %q (< 1.1.0)", confList.Name, confList.CNIVersion) | |
| } | |
| if gt, _ := cniversion.GreaterThanOrEqualTo(confList.CNIVersion, "1.1.0"); !gt { | |
| logging.Debugf("conflistStatus: skipping STATUS for network list %q cniVersion %q (< 1.1.0)", confList.Name, confList.CNIVersion) | |
| return nil | |
| } |
images/entrypoint.sh
Outdated
| EOF | ||
| ) | ||
| tmpfile=$(mktemp) | ||
| echo $CONF > $tmpfile |
There was a problem hiding this comment.
Multiple variables, including $CONF on line 454, are used without double quotes in shell commands. This causes the shell to perform word splitting and globbing on the variable's content. If the generated JSON configuration contains shell metacharacters (like *), they will be expanded to filenames in the current directory, leading to a corrupted configuration file. It also strips newlines and extra whitespace, which can result in invalid JSON.
Recommendation: Always wrap shell variables in double quotes when they are used in commands, especially when they contain data that should be preserved literally. For example, change line 454 to echo "$CONF" > "$tmpfile".
images/entrypoint.sh
Outdated
| CONF=$(cat <<-EOF | ||
| { | ||
| $CNI_VERSION_STRING | ||
| "name": "$MASTER_PLUGIN_NET_NAME", | ||
| "type": "multus", | ||
| $NESTED_CAPABILITIES_STRING | ||
| $ISOLATION_STRING | ||
| $GLOBAL_NAMESPACES_STRING | ||
| $LOG_TO_STDERR_STRING | ||
| $LOG_LEVEL_STRING | ||
| $LOG_FILE_STRING | ||
| $ADDITIONAL_BIN_DIR_STRING | ||
| $READINESS_INDICATOR_FILE_STRING | ||
| "kubeconfig": "$MULTUS_KUBECONFIG_FILE_HOST", | ||
| "delegates": [ | ||
| $MASTER_PLUGIN_JSON | ||
| ] | ||
| } | ||
| EOF |
There was a problem hiding this comment.
Values extracted from external CNI configuration files (such as MASTER_PLUGIN_NET_NAME on line 393) are embedded directly into a JSON template using a heredoc without proper JSON escaping. If these values contain double quotes or other characters that are special in JSON, the resulting 00-multus.conf file will be malformed. This can cause the Multus daemon to fail to start or fail to process networking requests for pods.
Recommendation: Use a JSON-aware tool to generate the configuration file instead of a shell heredoc. If shell must be used, ensure all variables are properly escaped for JSON using a tool like jq or a Python one-liner.
Dockerfile.openshift
Outdated
| RUN ./hack/build-go.sh && \ | ||
| cd /usr/src/multus-cni/bin |
There was a problem hiding this comment.
The cd /usr/src/multus-cni/bin command in this RUN instruction has no effect on subsequent instructions. Each RUN command executes in its own shell, and the working directory is reset for the next one based on the WORKDIR directive. This can be removed to avoid confusion. The same issue is present on lines 18-19.
RUN ./hack/build-go.sh
images/entrypoint.sh
Outdated
| --cni-bin-dir) | ||
| CNI_BIN_DIR=$VALUE | ||
| ;; |
cmd/multus-daemon/main.go
Outdated
| isInGracefulShutdownMode := func() bool { | ||
| if sigTermCtx.Err() == nil { | ||
| return false | ||
| } | ||
| return true | ||
| } |
STATUS and GC are plugin-level commands with no pod context per the CNI 1.1.0 spec. The thick plugin daemon incorrectly required CNI_CONTAINERID, CNI_NETNS, and K8S_POD_NAME/K8S_POD_NAMESPACE for these commands, causing failures when invoked by kubelet. Signed-off-by: Jason Kary <jkary@redhat.com>
d351be1 to
571a3ce
Compare
|
@jkary thanks for the contribution! The bot has left a pretty good review, take a look and address that and then I will take a look again |
The version check logged that STATUS would be skipped for older CNI versions but did not actually return, causing GetStatusNetworkList to be called regardless. Add the missing return nil.
|
@bpickard22 thanks for taking a look! I went through the Gemini comments and it looks like the bot reviewed the full branch diff against main rather than just this PR. Most of the comments are about files that aren't changed in this PR at all, like images/entrypoint.sh, Dockerfile.openshift, and cmd/multus-daemon/main.go. This PR only touches pkg/server/server.go and pkg/server/thick_cni_test.go, so those comments don't apply here. One thing the bot did catch that is worth fixing is a bug in pkg/multus/multus.go where conflistStatus logs that it will skip STATUS for older CNI versions but doesn't actually return, so it falls through and calls GetStatusNetworkList anyway. I pushed a fix for that. |
pliurh
left a comment
There was a problem hiding this comment.
Apart from the nits, it looks good to me.
| var err error | ||
| K8sClient = fakeK8sClient() | ||
| // Touch the default network file. | ||
| os.OpenFile(configPath, os.O_RDONLY|os.O_CREATE, 0755) |
There was a problem hiding this comment.
The return values are ignored. We shall ensure no error returned by this call.
| } | ||
| } | ||
|
|
||
| result, err := s.HandleCNIRequest(cmdType, k8sArgs, cniCmdArgs) |
There was a problem hiding this comment.
Consider documenting that k8sArgs may be nil for STATUS/GC in the HandleCNIRequest() signature, or adding a nil guard in it.
STATUS and GC are plugin-level commands with no pod context per the
CNI 1.1.0 spec. The thick plugin daemon incorrectly required
CNI_CONTAINERID, CNI_NETNS, and K8S_POD_NAME/K8S_POD_NAMESPACE for
these commands, causing failures when invoked by kubelet.