MTV-5829 | Add retry backoff and auth detection to Hyper-V collector#7230
Conversation
|
Hey @coderabbitai - please take a look |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
gh pr view 7230 --repo kubev2v/forklift --json title,body,files,additions,deletionsLength of output: 1587 🏁 Script executed: #!/bin/bash
gh pr diff 7230 --repo kubev2v/forkliftLength of output: 8965 🏁 Script executed: #!/bin/bash
# Get full content of the modified files for deeper review
cat pkg/lib/hyperv/driver/winrm.go
echo "---FILE_SEPARATOR---"
cat pkg/controller/provider/container/hyperv/client.go
echo "---FILE_SEPARATOR---"
cat pkg/controller/provider/container/hyperv/collector.goLength of output: 40031 🏁 Script executed: #!/bin/bash
# Check Go version in go.mod
head -5 go.mod
# Check RetryInterval constant exists and value
grep -n "RetryInterval" pkg/controller/provider/container/hyperv/collector.go
# Check how the provider controller uses Test() return values (to understand what happens with status=0)
rg -n "Test()" pkg/controller/provider/ --type=go -A5 -B5 | head -80
# Check how ConnectionAuthFailed is determined
rg -n "ConnectionAuthFailed\|StatusUnauthorized\|\.Test(" pkg/controller/provider/ --type=go -C3 | head -60Length of output: 6324 🏁 Script executed: #!/bin/bash
# How provider controller uses Test() results
rg -n "ConnectionAuthFailed" pkg/controller/provider/ --type=go -C5
echo "==="
# How .Test() is called
rg -n "\.Test\b" pkg/controller/provider/ --type=go -C5
echo "==="
# How StatusUnauthorized is checked in provider controller
rg -n "StatusUnauthorized" pkg/controller/provider/ --type=go -C5Length of output: 8061
Overall AssessmentThe PR is well-structured and addresses the core problem cleanly. The 🔴
|
| Change | Assessment |
|---|---|
RetryInterval = 5s in Start() loop |
✓ Matches oVirt / OpenStack / OVA pattern |
WrapCommandError applied in ExecuteCommandWithTimeout |
✓ All WinRM commands benefit automatically |
SMB non-auth errors downgraded from Error to Info |
✓ Reduces log noise for transient mount failures |
winrm.go log → "WinRM client initialized." |
✓ Accurate — winrm.NewClient creates no socket |
"Connected" log moved to after IsAlive() in Test() |
✓ Now reflects actual confirmed connectivity |
fmt.Errorf("%w: %w", ErrUnauthorized, err) |
✓ Go 1.20+ multi-error wrapping; Go 1.25 in go.mod |
errors_test.go coverage of httpStatus + WrapCommandError |
✓ Good table-driven tests covering all branches |
📝 WalkthroughWalkthroughA new ChangesHyper-V WinRM Auth Error Propagation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/controller/provider/container/hyperv/client.go (1)
87-87: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winInclude the discovery error in the reconnect log context.
Line 87 logs a generic message without
pErr, which makes repeated reconnect failures harder to diagnose in production.Suggested patch
- r.Log.Info("SMB Windows prefix not yet discovered, will attempt on next reconnect") + r.Log.Info("SMB Windows prefix not yet discovered, will attempt on next reconnect", "error", pErr)As per coding guidelines, "Observability: ... Logs should provide context, not just messages."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/provider/container/hyperv/client.go` at line 87, The log statement using r.Log.Info with message "SMB Windows prefix not yet discovered, will attempt on next reconnect" is missing the pErr error variable that contains the discovery error details. Modify this log call to include pErr as a field or in the message itself to provide the actual error context needed for diagnosing reconnect failures. This ensures the log output contains actionable error information rather than just a generic status message.Source: Coding guidelines
pkg/controller/provider/container/hyperv/collector.go (1)
167-178: 🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy liftAdd unit tests for the new
Test()and retry-loop branches.This PR introduces new control-flow that should be locked with tests: unauthorized from
Connect(), unauthorized fromIsAlive(), and retry behavior inStart().As per coding guidelines, "coverage: Make sure that the code has unit tests."
Also applies to: 206-212
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/provider/container/hyperv/collector.go` around lines 167 - 178, The Test() and Start() methods now contain new control-flow branches handling unauthorized errors and retry behavior that lack unit test coverage. Add unit tests to cover: the unauthorized error path from the r.client.Connect() call in the Test() method, the unauthorized error path from the r.client.driver.IsAlive() call in the Test() method where it checks errors.Is(err, driver.ErrUnauthorized) and returns http.StatusUnauthorized, and the retry loop behavior in the Start() method (lines 206-212). Ensure each test case verifies the correct status code and error handling for the unauthorized scenarios, and that the retry logic executes as expected.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/provider/container/hyperv/collector.go`:
- Around line 209-212: The time.Sleep(RetryInterval) call in the error handling
block of the r.run(&ctx) failure case is not cancellation-aware, meaning it will
block for the full interval even after the context is cancelled during shutdown.
Replace this blocking sleep with a context-aware sleep mechanism that respects
the context's Done channel, allowing the retry loop to be interrupted
immediately upon cancellation instead of waiting for the full retry interval to
elapse.
- Around line 169-170: In the Test() method, the Connect() call at line 169
returns immediately on error without checking if the error is a
driver.ErrUnauthorized. Instead of returning the raw error, add a check after
the Connect() call to detect if the returned error is driver.ErrUnauthorized,
and when it is, map it to an http.StatusUnauthorized response before returning.
This ensures that unauthorized SMB failures from the Connect() method are
properly converted to HTTP 401 status codes for the caller.
---
Nitpick comments:
In `@pkg/controller/provider/container/hyperv/client.go`:
- Line 87: The log statement using r.Log.Info with message "SMB Windows prefix
not yet discovered, will attempt on next reconnect" is missing the pErr error
variable that contains the discovery error details. Modify this log call to
include pErr as a field or in the message itself to provide the actual error
context needed for diagnosing reconnect failures. This ensures the log output
contains actionable error information rather than just a generic status message.
In `@pkg/controller/provider/container/hyperv/collector.go`:
- Around line 167-178: The Test() and Start() methods now contain new
control-flow branches handling unauthorized errors and retry behavior that lack
unit test coverage. Add unit tests to cover: the unauthorized error path from
the r.client.Connect() call in the Test() method, the unauthorized error path
from the r.client.driver.IsAlive() call in the Test() method where it checks
errors.Is(err, driver.ErrUnauthorized) and returns http.StatusUnauthorized, and
the retry loop behavior in the Start() method (lines 206-212). Ensure each test
case verifies the correct status code and error handling for the unauthorized
scenarios, and that the retry logic executes as expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eda45e4d-66c0-4128-be01-457d042c0f6c
📒 Files selected for processing (5)
pkg/controller/provider/container/hyperv/client.gopkg/controller/provider/container/hyperv/collector.gopkg/lib/hyperv/driver/errors.gopkg/lib/hyperv/driver/errors_test.gopkg/lib/hyperv/driver/winrm.go
The Hyper-V collector retried failed runs with no delay, spamming the host with WinRM requests when credentials were incorrect or basic auth was not enabled. - Add RetryInterval (5s) sleep on run() failure in Start() loop, matching the pattern used by oVirt, OpenStack, and OVA collectors. - Introduce ErrUnauthorized sentinel error in the driver package, wrapping WinRM HTTP 401/403 responses so callers use errors.Is() instead of fragile string matching. - Update Test() to execute a real WinRM command (IsAlive) and return http.StatusUnauthorized on auth failure, triggering ConnectionAuthFailed in the provider controller. - Propagate auth errors from SMB prefix discovery in Connect() Ref: https://redhat.atlassian.net/browse/MTV-5829 Resolves: MTV-5829 Signed-off-by: Elad Hazan <ehazan@redhat.com>
ba8ff2b to
24784da
Compare
|
Thanks @coderabbitai - please follow-up |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== collector.go Test() method ==="
rg -n "func.*Test\(\)" pkg/controller/provider/container/hyperv/collector.go -A 20
echo "=== errors.go ==="
cat pkg/lib/hyperv/driver/errors.goLength of output: 1988
🔴
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7230 +/- ##
==========================================
- Coverage 15.45% 11.03% -4.42%
==========================================
Files 112 524 +412
Lines 23377 62892 +39515
==========================================
+ Hits 3613 6943 +3330
- Misses 19479 55354 +35875
- Partials 285 595 +310
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/backport release-2.12 |
|
|
✅ PR #7230 backported to |
…er-V collector (#7232) **Backport:** #7230 The Hyper-V collector retried failed runs with no delay, spamming the host with WinRM requests when credentials were incorrect or basic auth was not enabled. - Add RetryInterval (5s) sleep on run() failure in Start() loop, matching the pattern used by oVirt, OpenStack, and OVA collectors. - Introduce ErrUnauthorized sentinel error in the driver package, wrapping WinRM HTTP 401/403 responses so callers use errors.Is() instead of fragile string matching. - Update Test() to execute a real WinRM command (IsAlive) and return http.StatusUnauthorized on auth failure, triggering ConnectionAuthFailed in the provider controller. - Propagate auth errors from SMB prefix discovery in Connect() Ref: https://redhat.atlassian.net/browse/MTV-5829 Resolves: MTV-5829 Signed-off-by: Elad Hazan <ehazan@redhat.com> Co-authored-by: Elad Hazan <ehazan@redhat.com>



The Hyper-V collector retried failed runs with no delay, spamming the host with WinRM requests when credentials were incorrect or basic auth was not enabled.
Add RetryInterval (5s) sleep on run() failure in Start() loop, matching the pattern used by oVirt, OpenStack, and OVA collectors.
Introduce ErrUnauthorized sentinel error in the driver package, wrapping WinRM HTTP 401/403 responses so callers use errors.Is() instead of fragile string matching.
Update Test() to execute a real WinRM command (IsAlive) and return http.StatusUnauthorized on auth failure, triggering ConnectionAuthFailed in the provider controller.
Propagate auth errors from SMB prefix discovery in Connect()
Ref: https://redhat.atlassian.net/browse/MTV-5829
Resolves: MTV-5829