Skip to content

daemon: fix kernel argument management and logging errors#130

Merged
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
rollandf:karg-temp
Jan 15, 2026
Merged

daemon: fix kernel argument management and logging errors#130
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
rollandf:karg-temp

Conversation

@rollandf
Copy link
Member

@rollandf rollandf commented Jan 15, 2026

This commit addresses several issues in the sriov-network-config-daemon relating to host configuration management and observability.

  1. Fix kargs.sh for read-only root filesystems: The daemon now uses the host's /tmp directory (via chroot) for temporary workspace files. This allows the script to function correctly when the container is deployed with 'readOnlyRootFilesystem: true', which previously caused 'cp' and 'grep' failures.

  2. Fix logging serialization bug: Updated NodeReconciler to correctly call the Name() method of the main plugin when logging errors. Previously, passing the function pointer directly caused "json: unsupported type: func() string" errors, masking the actual reconciliation failures.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@greptile-apps
Copy link

greptile-apps bot commented Jan 15, 2026

Greptile Summary

This PR fixes two critical bugs in the sriov-network-config-daemon: a read-only filesystem compatibility issue in kernel argument management and a JSON serialization error in logging.

Changes:

  • bindata/scripts/kargs.sh: Changed to use host's /tmp directory (via chroot mktemp) instead of container's /tmp for temporary grub config files, allowing the script to work when the container runs with readOnlyRootFilesystem: true
  • pkg/daemon/daemon.go: Fixed logging bug by calling dn.mainPlugin.Name() method instead of passing the function pointer, which was causing "json: unsupported type: func() string" serialization errors

Key Points:

  • The kargs.sh fix properly creates and cleans up temporary files on the host filesystem in all code paths
  • The daemon.go fix is a trivial one-character change that correctly invokes the Name() method
  • Both fixes address production issues that prevented proper operation and observability

Confidence Score: 4/5

  • This PR is safe to merge with low risk - fixes two well-isolated bugs with straightforward solutions
  • Score reflects high confidence in the daemon.go fix (trivial method call correction) and the kargs.sh fix (proper temp file handling with cleanup). Score is 4/5 rather than 5/5 because kargs.sh still has pre-existing issues noted in previous review threads (unescaped regex patterns, use of grep -P and let commands) that weren't addressed in this PR, though they don't affect the correctness of the new changes
  • No files require special attention - both changes are straightforward bug fixes

Important Files Changed

Filename Overview
bindata/scripts/kargs.sh Fixed read-only root filesystem issue by using host's /tmp via mktemp for temporary grub config copies; proper cleanup added on all exit paths
pkg/daemon/daemon.go Fixed JSON serialization bug by calling Name() method instead of passing function pointer to logger

Sequence Diagram

sequenceDiagram
    participant Daemon as sriov-config-daemon
    participant Plugin as Main Plugin
    participant Logger as Logger
    participant KargsScript as kargs.sh
    participant Host as Host Filesystem

    Note over Daemon,Host: Scenario 1: Node State Change (daemon.go fix)
    Daemon->>Plugin: OnNodeStateChange(desiredNodeState)
    Plugin-->>Daemon: reqDrain, reqReboot, err
    alt Error occurred
        Daemon->>Logger: Error(err, "OnNodeStateChange plugin error", "mainPluginName", Name())
        Note over Daemon,Logger: Fixed: Now calls Name() method<br/>instead of passing function pointer
        Logger->>Logger: Serialize log with mainPluginName string
    end

    Note over Daemon,Host: Scenario 2: Kernel Args Update (kargs.sh fix)
    Daemon->>KargsScript: Execute kargs.sh add/remove args
    KargsScript->>Host: chroot mktemp (create temp file on host)
    Host-->>KargsScript: /tmp/tmp.XXXXXX path
    Note over KargsScript: tmp_grub_in_container=/host/tmp/tmp.XXXXXX
    KargsScript->>Host: cp /host/etc/default/grub to temp file
    KargsScript->>KargsScript: Modify temp grub config
    alt Changes made
        KargsScript->>Host: cp temp file back to /host/etc/default/grub
        KargsScript->>Host: chroot update-grub
    end
    KargsScript->>Host: rm -f temp file (cleanup)
    Note over KargsScript,Host: Fixed: Uses host /tmp instead of<br/>container /tmp for read-only root FS
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

if [ $? -ne 0 ];then
exit 1
# Check if already present in any active GRUB_CMDLINE_LINUX line
if grep -E "^[[:space:]]*GRUB_CMDLINE_LINUX(_DEFAULT)?[[:space:]]*=.*([[:space:]]|\"|^)${t}([[:space:]]|\"|$)" "${tmp_grub_in_container}" > /dev/null; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The regex pattern ([[:space:]]|\"|^)${t}([[:space:]]|\"|$) is unescaped in the grep command. If ${t} contains regex metacharacters (like ., *, [, etc.), it could match unintended strings or fail. For example, net.ifnames=0 would match netXifnames=0 due to . being a wildcard.

Escape ${t} or use grep -F (fixed string matching) for more reliable matching.

Copy link
Collaborator

@heyvister1 heyvister1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

if [[ $command == "add" ]];then
# Modify only GRUB_CMDLINE_LINUX_DEFAULT line if it's not already present
line=$(grep -P "^\s*GRUB_CMDLINE_LINUX_DEFAULT" /tmp/grub)
line=$(grep -P "^\s*GRUB_CMDLINE_LINUX_DEFAULT" "${tmp_grub_in_container}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: PR description claims switching from grep -P to grep -E for better compatibility with minimal container environments, but this line still uses grep -P. The -P flag (Perl regex) may not be available in all minimal containers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description updated

new_param="${arr[@]} ${t}"
sed -i "s/\(^\s*$g\"\)\(.*\)\"/\1${new_param}\"/" /tmp/grub
sed -i "s/\(^\s*$g\"\)\(.*\)\"/\1${new_param}\"/" "${tmp_grub_in_container}"
let ret++
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: PR description claims to have replaced let with standard arithmetic expansion for consistent exit codes, but let ret++ is still used throughout the script (lines 67, 111, 119, 135, 143). Consider using ret=$((ret + 1)) or ((ret++)) for POSIX compliance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description updated

This commit addresses several issues in the sriov-network-config-daemon
relating to host configuration management and observability.

1. Fix kargs.sh for read-only root filesystems:
   The daemon now uses the host's /tmp directory (via chroot) for temporary
   workspace files. This allows the script to function correctly when the
   container is deployed with 'readOnlyRootFilesystem: true', which previously
   caused 'cp' and 'grep' failures.

2. Fix logging serialization bug:
   Updated NodeReconciler to correctly call the Name() method of the main
   plugin when logging errors. Previously, passing the function pointer directly
   caused "json: unsupported type: func() string" errors, masking the actual
   reconciliation failures.

Signed-off-by: Fred Rolland <frolland@nvidia.com>
@almaslennikov almaslennikov merged commit c34e929 into Mellanox:network-operator-26.1.x Jan 15, 2026
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants