Skip to content

daemon: fix kernel argument management and logging errors#144

Merged
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
rollandf:karg-temp-restore
Jan 22, 2026
Merged

daemon: fix kernel argument management and logging errors#144
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
rollandf:karg-temp-restore

Conversation

@rollandf
Copy link
Member

This commit addresses several issues in the sriov-network-config-daemon relating to host configuration management and observability.

  1. Fix kargs.sh for read-only root filesystems: The daemon now uses the host's /tmp directory (via chroot) for temporary workspace files. This allows the script to function correctly when the container is deployed with 'readOnlyRootFilesystem: true', which previously caused 'cp' and 'grep' failures.

  2. Fix logging serialization bug: Updated NodeReconciler to correctly call the Name() method of the main plugin when logging errors. Previously, passing the function pointer directly caused "json: unsupported type: func() string" errors, masking the actual reconciliation failures.

This commit addresses several issues in the sriov-network-config-daemon
relating to host configuration management and observability.

1. Fix kargs.sh for read-only root filesystems:
   The daemon now uses the host's /tmp directory (via chroot) for temporary
   workspace files. This allows the script to function correctly when the
   container is deployed with 'readOnlyRootFilesystem: true', which previously
   caused 'cp' and 'grep' failures.

2. Fix logging serialization bug:
   Updated NodeReconciler to correctly call the Name() method of the main
   plugin when logging errors. Previously, passing the function pointer directly
   caused "json: unsupported type: func() string" errors, masking the actual
   reconciliation failures.

Signed-off-by: Fred Rolland <frolland@nvidia.com>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@almaslennikov almaslennikov merged commit 2b8c2ea into Mellanox:network-operator-26.1.x Jan 22, 2026
11 of 12 checks passed
@greptile-apps
Copy link

greptile-apps bot commented Jan 22, 2026

Greptile Summary

Fixed two critical bugs in the SR-IOV network config daemon: (1) kernel argument management now works with read-only container filesystems by using the host's /tmp directory via chroot mktemp, and (2) logging serialization errors resolved by calling Name() method instead of passing the function pointer.

  • Modified kargs.sh to create temporary files in host filesystem instead of container root
  • Added proper cleanup of temporary files on all exit paths including error cases
  • Fixed JSON marshaling error in daemon.go by invoking the Name() method to get string value
  • Both changes are minimal, focused fixes that address specific runtime failures

Confidence Score: 5/5

  • Safe to merge - both fixes address specific, well-understood bugs with minimal, correct changes
  • The changes are straightforward bug fixes with clear benefits: the kargs.sh change enables operation with read-only root filesystems (a security best practice), and the daemon.go change fixes a logging bug that was masking actual errors. Both fixes are minimal and targeted, with no architectural changes or risky refactoring
  • No files require special attention

Important Files Changed

Filename Overview
bindata/scripts/kargs.sh Fixed read-only filesystem issue by using host's /tmp via mktemp instead of container's /tmp, with proper cleanup on all paths
pkg/daemon/daemon.go Fixed JSON serialization bug by calling Name() method instead of passing function pointer for logging

Sequence Diagram

sequenceDiagram
    participant Daemon as sriov-config-daemon
    participant Container as Container FS
    participant Host as Host FS (chroot)
    participant Logger as Error Logger
    participant Plugin as VendorPlugin

    Note over Daemon,Plugin: Scenario 1: Kernel Args Management (kargs.sh)
    
    Daemon->>Container: Check if container root is read-only
    Container-->>Daemon: readOnlyRootFilesystem: true
    
    Daemon->>Host: chroot mktemp (create temp file in host /tmp)
    Host-->>Daemon: /tmp/tmp.XXXXXX
    
    Daemon->>Daemon: Map host path to container: /host/tmp/tmp.XXXXXX
    Daemon->>Host: cp /etc/default/grub to temp file
    
    Daemon->>Daemon: Modify kernel args in temp file
    
    alt Changes detected
        Daemon->>Host: cp temp file back to /etc/default/grub
        Daemon->>Host: chroot update-grub
        Host-->>Daemon: grub updated
    end
    
    Daemon->>Daemon: rm -f temp file (cleanup)
    
    Note over Daemon,Plugin: Scenario 2: Logging Fix (daemon.go)
    
    Daemon->>Plugin: OnNodeStateChange(desiredState)
    Plugin-->>Daemon: error occurred
    
    Daemon->>Logger: funcLog.Error(err, mainPluginName: dn.mainPlugin.Name())
    Note over Logger: Before: passed dn.mainPlugin.Name (func pointer)<br/>After: passed dn.mainPlugin.Name() (string result)
    Logger-->>Daemon: Successfully logged error with plugin name

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants