Skip to content

SRIOV persistent logging enhancement#1057

Open
gavrielg1 wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
gavrielg1:dev/sriov-presistent-logging
Open

SRIOV persistent logging enhancement#1057
gavrielg1 wants to merge 2 commits intok8snetworkplumbingwg:masterfrom
gavrielg1:dev/sriov-presistent-logging

Conversation

@gavrielg1
Copy link

SRIOV persistent logging enhancement -
host-based log storage with rotation for the sriov-network-config-daemon

Sriov presistent logging enhancement - host-based log storage with rotation
for the sriov-network-config-daemon
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions github-actions bot added the docs label Mar 15, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the SR-IOV network configuration daemon by implementing persistent, host-based logging with rotation. This change addresses a critical debugging gap where daemon logs were lost after container restarts or node reboots, making it difficult to diagnose issues or understand the reasons behind system reconfigurations. By persisting logs to the host filesystem and providing configurable rotation, this feature ensures a reliable audit trail and improves the overall observability and troubleshooting experience for operators.

Highlights

  • Persistent Logging: Implemented host-based persistent logging with rotation for the sriov-network-config-daemon to ensure logs survive container restarts and node reboots, improving debugging capabilities.
  • Log Rotation: Integrated the lumberjack library to manage log rotation, including configurable maximum file size, number of retained files, and age-based cleanup.
  • Configuration: Extended the SriovOperatorConfig Custom Resource Definition (CRD) to allow operators to configure log persistence settings such as enablement, size limits, and retention policies.
  • Chroot Compatibility: Designed the logging mechanism to function correctly even when the daemon performs chroot operations, ensuring continuous log capture during critical system changes.
  • Design Document: Added a comprehensive design document outlining the motivation, goals, proposal, implementation details, and test plan for the persistent logging enhancement.
Changelog
  • doc/design/sriov-persistent-logging-enhancement.md
    • Added a new design document detailing the persistent logging enhancement for the sriov-network-config-daemon.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a design document for adding persistent logging to the sriov-network-config-daemon. This is a valuable enhancement for debuggability. The design is mostly well-thought-out, covering motivation, use cases, API changes, and testing. However, I've identified a few areas that need clarification and correction to ensure a robust implementation. My main concerns are around the handling of chroot, the logic for enabling logging by default, and the log level for file-based logging. Please see the detailed comments.

Comment on lines +204 to +212
if operatorConfig.Spec.LogConfig != nil && operatorConfig.Spec.LogConfig.Enabled {
snolog.InitLogWithFile(
filepath.Join("/host/var/log/sriov-network-config-daemon", "config-daemon.log"),
operatorConfig.Spec.LogConfig.MaxSizeMB,
operatorConfig.Spec.LogConfig.MaxFiles,
operatorConfig.Spec.LogConfig.MaxAgeDays,
operatorConfig.Spec.LogConfig.Compress,
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an inconsistency between the described default behavior and the proposed implementation logic for enabling logging.

The "Upgrade & Downgrade considerations" section on line 237 states: "On upgrade, if LogConfig is not set, persistent logging defaults to enabled with default values." This implies an opt-out model where logging is on by default.

However, the code snippet for runStartCmd shows an opt-in logic:

if operatorConfig.Spec.LogConfig != nil && operatorConfig.Spec.LogConfig.Enabled {
    // ... initialize file logging
}

This code has two issues:

  1. It will not enable logging if LogConfig is nil, which contradicts the "enabled by default" requirement.
  2. The condition operatorConfig.Spec.LogConfig.Enabled is incorrect because Enabled is a *bool. It should be dereferenced, e.g., *operatorConfig.Spec.LogConfig.Enabled, and the nil case for the pointer must be handled.

The logic should be adjusted to correctly handle the pointer and reflect the intended "enabled by default" behavior. For example:

logCfg := operatorConfig.Spec.LogConfig
if logCfg == nil || logCfg.Enabled == nil || *logCfg.Enabled {
    // ... initialize file logging with defaults if needed
}

Please update the design to reflect the correct, consistent logic.

Comment on lines +157 to +159
// Create a zap core for file output
fileEncoder := zapcore.NewJSONEncoder(zap.NewProductionEncoderConfig())
fileCore := zapcore.NewCore(fileEncoder, zapcore.AddSync(fileWriter), zapcore.DebugLevel)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed InitLogWithFile function hardcodes the log level for the file core to zapcore.DebugLevel:

// Create a zap core for file output
fileEncoder := zapcore.NewJSONEncoder(zap.NewProductionEncoderConfig())
fileCore := zapcore.NewCore(fileEncoder, zapcore.AddSync(fileWriter), zapcore.DebugLevel)

This might not be the desired behavior. The file logs should likely adhere to the same log level as the console logs, which is configurable via the SriovOperatorConfig. The design should specify that the file logger will use the same dynamic log level as the console logger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant