daemon: fix kernel argument management and logging errors#130
daemon: fix kernel argument management and logging errors#130almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
Conversation
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
Greptile SummaryThis PR fixes two critical bugs in the sriov-network-config-daemon: a read-only filesystem compatibility issue in kernel argument management and a JSON serialization error in logging. Changes:
Key Points:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Daemon as sriov-config-daemon
participant Plugin as Main Plugin
participant Logger as Logger
participant KargsScript as kargs.sh
participant Host as Host Filesystem
Note over Daemon,Host: Scenario 1: Node State Change (daemon.go fix)
Daemon->>Plugin: OnNodeStateChange(desiredNodeState)
Plugin-->>Daemon: reqDrain, reqReboot, err
alt Error occurred
Daemon->>Logger: Error(err, "OnNodeStateChange plugin error", "mainPluginName", Name())
Note over Daemon,Logger: Fixed: Now calls Name() method<br/>instead of passing function pointer
Logger->>Logger: Serialize log with mainPluginName string
end
Note over Daemon,Host: Scenario 2: Kernel Args Update (kargs.sh fix)
Daemon->>KargsScript: Execute kargs.sh add/remove args
KargsScript->>Host: chroot mktemp (create temp file on host)
Host-->>KargsScript: /tmp/tmp.XXXXXX path
Note over KargsScript: tmp_grub_in_container=/host/tmp/tmp.XXXXXX
KargsScript->>Host: cp /host/etc/default/grub to temp file
KargsScript->>KargsScript: Modify temp grub config
alt Changes made
KargsScript->>Host: cp temp file back to /host/etc/default/grub
KargsScript->>Host: chroot update-grub
end
KargsScript->>Host: rm -f temp file (cleanup)
Note over KargsScript,Host: Fixed: Uses host /tmp instead of<br/>container /tmp for read-only root FS
|
bindata/scripts/kargs.sh
Outdated
| if [ $? -ne 0 ];then | ||
| exit 1 | ||
| # Check if already present in any active GRUB_CMDLINE_LINUX line | ||
| if grep -E "^[[:space:]]*GRUB_CMDLINE_LINUX(_DEFAULT)?[[:space:]]*=.*([[:space:]]|\"|^)${t}([[:space:]]|\"|$)" "${tmp_grub_in_container}" > /dev/null; then |
There was a problem hiding this comment.
logic: The regex pattern ([[:space:]]|\"|^)${t}([[:space:]]|\"|$) is unescaped in the grep command. If ${t} contains regex metacharacters (like ., *, [, etc.), it could match unintended strings or fail. For example, net.ifnames=0 would match netXifnames=0 due to . being a wildcard.
Escape ${t} or use grep -F (fixed string matching) for more reliable matching.
| if [[ $command == "add" ]];then | ||
| # Modify only GRUB_CMDLINE_LINUX_DEFAULT line if it's not already present | ||
| line=$(grep -P "^\s*GRUB_CMDLINE_LINUX_DEFAULT" /tmp/grub) | ||
| line=$(grep -P "^\s*GRUB_CMDLINE_LINUX_DEFAULT" "${tmp_grub_in_container}") |
There was a problem hiding this comment.
style: PR description claims switching from grep -P to grep -E for better compatibility with minimal container environments, but this line still uses grep -P. The -P flag (Perl regex) may not be available in all minimal containers.
| new_param="${arr[@]} ${t}" | ||
| sed -i "s/\(^\s*$g\"\)\(.*\)\"/\1${new_param}\"/" /tmp/grub | ||
| sed -i "s/\(^\s*$g\"\)\(.*\)\"/\1${new_param}\"/" "${tmp_grub_in_container}" | ||
| let ret++ |
There was a problem hiding this comment.
style: PR description claims to have replaced let with standard arithmetic expansion for consistent exit codes, but let ret++ is still used throughout the script (lines 67, 111, 119, 135, 143). Consider using ret=$((ret + 1)) or ((ret++)) for POSIX compliance.
This commit addresses several issues in the sriov-network-config-daemon relating to host configuration management and observability. 1. Fix kargs.sh for read-only root filesystems: The daemon now uses the host's /tmp directory (via chroot) for temporary workspace files. This allows the script to function correctly when the container is deployed with 'readOnlyRootFilesystem: true', which previously caused 'cp' and 'grep' failures. 2. Fix logging serialization bug: Updated NodeReconciler to correctly call the Name() method of the main plugin when logging errors. Previously, passing the function pointer directly caused "json: unsupported type: func() string" errors, masking the actual reconciliation failures. Signed-off-by: Fred Rolland <frolland@nvidia.com>
c34e929
into
Mellanox:network-operator-26.1.x
This commit addresses several issues in the sriov-network-config-daemon relating to host configuration management and observability.
Fix kargs.sh for read-only root filesystems: The daemon now uses the host's /tmp directory (via chroot) for temporary workspace files. This allows the script to function correctly when the container is deployed with 'readOnlyRootFilesystem: true', which previously caused 'cp' and 'grep' failures.
Fix logging serialization bug: Updated NodeReconciler to correctly call the Name() method of the main plugin when logging errors. Previously, passing the function pointer directly caused "json: unsupported type: func() string" errors, masking the actual reconciliation failures.