Collect Kueue diagnostics on test failure#928
Conversation
When Kueue integration tests fail, the existing namespace-scoped artifacts (pod logs, events) don't capture the operator-level state needed to diagnose issues like missing default LocalQueue auto-creation. This adds automatic collection of Kueue CR state, full DSC, and Kueue controller-manager logs via SetupKueue's cleanup hook, gated on test failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthrough
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Direct findings — no praise: CWE-532 (Information Exposure Through Log Files): CWE-20 (Improper Input Validation) — hardcoded namespace list in Log tail truncation — Error handling in 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/common/support/kueue_operator.go`:
- Around line 148-149: The code persists full resource JSON and operator logs
through WriteToOutputDir calls at lines 148, 164, and 193 without redacting
sensitive fields like Password, Token, APIKey, and Secret.Data, which can expose
credentials in CI artifacts. Create a helper function to redact these sensitive
field values from the data before it is written to the output directory, then
apply this redaction function to the data parameter in all three
WriteToOutputDir calls (the kueue-cr-state, operator log, and any other artifact
writes) to ensure sensitive information is masked in the diagnostic output.
- Around line 180-187: The io.ReadAll(stream) call on line 186 reads pod logs
without any byte limit, which can cause memory exhaustion if logs grow
unbounded. Wrap the stream returned from GetLogs().Stream() with io.LimitReader
to set a maximum byte limit before passing it to io.ReadAll. This prevents
unbounded memory allocation during pod log retrieval in the test support
function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d5d53227-7e4e-48fe-8ae8-e5efa84ea2d4
📒 Files selected for processing (1)
tests/common/support/kueue_operator.go
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ChughShilpa, sutaakar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
StoreKueueDiagnostics()to collect Kueue CR state, full DSC, and Kueue controller-manager pod logs when Kueue integration tests failSetupKueue()viat.T().Cleanup()gated ont.T().Failed()— zero overhead on passing testsopenshift-kueue-operatorandkueue-systemnamespaces for controller pods usingcontrol-plane=controller-manager,app.kubernetes.io/name=kueuelabelsMotivation
When Kueue integration tests fail (e.g. "default LocalQueue not found"), the current namespace-scoped artifacts (pod logs, events) don't capture the operator-level state needed to diagnose why LocalQueues aren't being auto-created.
Artifacts produced on failure
kueue-cr-state.logdsc-state.logkueue-operator-{pod}-{container}.logTest plan
go vet ./tests/common/support/...— compiles cleangolangci-lint— 0 issuesverify-imports— passes🤖 Generated with Claude Code