fix: add timeout to sysfs writes to prevent daemon hang#1033
fix: add timeout to sysfs writes to prevent daemon hang#1033zeeke wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
Conversation
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
Summary of ChangesHello @zeeke, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where kernel drivers can cause the daemon to hang indefinitely when writing to sysfs files, specifically Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a timeout for sysfs write operations to prevent the daemon from hanging when a kernel driver is in a bad state. This is achieved by replacing direct os.WriteFile calls with a new utils.WriteFileWithTimeout utility function. The new utility is well-implemented and comes with comprehensive unit tests covering success, error, and timeout scenarios. My feedback includes a suggestion to improve resource management in the new timeout utility.
Pull Request Test Coverage Report for Build 22092217312Details
💛 - Coveralls |
28d8065 to
d0924a4
Compare
|
Hi @zeeke can you check the tests please |
02d1571 to
3d9aa3c
Compare
Kernel drivers (e.g. i40e) can block indefinitely when writing to sriov_numvfs if the device is in a bad state. For example, the following error has been hit on a `Intel XXV710` NIC: ``` Feb 16 13:53:01 worker0 kernel: 06c73374b594186: left promiscuous mode Feb 16 13:53:01 worker0 kernel: i40e 0000:3b:00.0: Setting MAC 5e:28:32:f0:80:20 on VF 1 Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Bring down and up the VF interface to make this change effective. Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending. Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending. Feb 16 13:53:02 worker0 kernel: 152a5b6a3b44739: left promiscuous mode ``` Replace direct `os.WriteFile` calls in SetSriovNumVfs with a new `WriteFileWithTimeout` utility that runs the write in a goroutine and returns a timeout error after 2 minutes. Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
3d9aa3c to
a3c884f
Compare
|
Unit test flake is fixed in |
Kernel drivers (e.g. i40e) can block indefinitely when writing to sriov_numvfs if the device is in a bad state. For example, the following error has been hit on a
Intel XXV710NIC:Replace direct
os.WriteFilecalls in SetSriovNumVfs with a newWriteFileWithTimeoututility that runs the write in a goroutine and returns a timeout error after 2 minutes.