WIP OCPNODE-4560: Migrate OCP-56266 verify kubelet/crio deletes netns when pod deleted#31242
WIP OCPNODE-4560: Migrate OCP-56266 verify kubelet/crio deletes netns when pod deleted#31242BhargaviGudi wants to merge 1 commit into
Conversation
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds node utilities to extract a pod's NetNS path from CRI-O journal and verify its removal, plus a Ginkgo e2e that creates a pod, captures its NetNS, deletes the pod, and asserts on-node NetNS cleanup; documents the test in the node README. ChangesPod network namespace cleanup validation
sequenceDiagram
participant E2E as E2E test
participant NodeExec as Node exec/chroot
participant CRIO as CRI-O journal
participant NodeFS as Node filesystem
E2E->>NodeExec: run journalctl filter for pod NetNS (GetPodNetNs)
NodeExec->>CRIO: query NetNS entries
CRIO-->>NodeExec: returns NetNS:<path>
NodeExec-->>E2E: netns path
E2E->>Kubelet: delete pod
E2E->>NodeExec: run 'test -e <netns>' (CheckNetNsCleaned)
NodeExec->>NodeFS: check path existence
NodeFS-->>NodeExec: path absent -> success
NodeExec-->>E2E: return success
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BhargaviGudi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
033cec1 to
07bfcca
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/node_e2e/netns_cleanup.go`:
- Around line 102-105: Replace the string match on "not found" with the
Kubernetes API error helper: in the wait.PollUntilContextTimeout callback check
apierrors.IsNotFound(pollErr) instead of strings.Contains(pollErr.Error(), "not
found") (i.e., use k8s.io/apimachinery/pkg/api/errors as apierrors and call
apierrors.IsNotFound(pollErr)); keep the same Pod GET via
oc.KubeClient().CoreV1().Pods(namespace).Get(...) and the e2e.Logf call, and
remove the unused "strings" import after this change.
In `@test/extended/node/node_utils.go`:
- Line 779: The test currently logs raw CRI-O journal output via
framework.Logf("NetNs journal output: %v", netNsStr) which may expose
host/runtime details; replace that call so it does not print the full journal
content — log only the sanitized NetNS identifier/path (the extracted token you
already compute, e.g., netNsToken or netNsPath) or omit the value entirely, and
remove any usage of netNsStr in logs; update the framework.Logf invocation in
node_utils.go to output only the safe identifier instead of netNsStr.
- Around line 805-813: CheckNetNsCleaned currently inspects stdout for the "No
such file or directory" message but ExecOnNodeWithChroot discards stderr; modify
the ExecOnNodeWithChroot invocation in CheckNetNsCleaned so the error text is
captured (e.g., run via shell with stderr redirected into stdout like invoking
"sh -c 'ls -l <netNsPath> 2>&1'" or call a combined-output helper if available)
so the function can reliably detect the "No such file or directory" string from
the returned result; update the call that uses ExecOnNodeWithChroot(nodeName,
"ls", "-l", netNsPath) accordingly.
- Around line 773-774: The code builds a shell pipeline into cmd using raw
podName which allows command/regex injection; change the call that constructs
cmd and the ExecOnNodeWithChroot invocation to pass a safely quoted/escaped
podName (or use grep -F --) instead of raw interpolation. Concretely, update the
cmd creation (the variable named cmd) to use an escaping/quoting function (e.g.,
fmt.Sprintf with %q or otherwise escape shell metacharacters) and prefer grep -F
-- %s to force literal matching, then call ExecOnNodeWithChroot(oc, nodeName,
"/bin/bash", "-c", cmd) with that sanitized value so podName cannot inject extra
shell syntax (target symbols: cmd, podName, ExecOnNodeWithChroot, nodeName).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 46e20e07-b293-4e83-a1cb-1a5460a50299
📒 Files selected for processing (3)
test/extended/node/README.mdtest/extended/node/node_e2e/netns_cleanup.gotest/extended/node/node_utils.go
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/README.md`:
- Line 12: Update the README entry for "node_e2e/netns_cleanup.go" that
currently lists the invalid Jira key "OCP-56266": either replace "OCP-56266"
with the correct tracker ID if known, or remove the tracker ID entirely (leaving
the description "Network namespace cleanup (OCP-56266) - Verifies
kubelet/CRI-O..." adjusted accordingly) so the catalog no longer references the
non-existent key; ensure the line still reads clearly and retains the test name
"node_e2e/netns_cleanup.go" and its description.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8ab06ab2-79c0-4bf7-8487-90c7bb09ad79
📒 Files selected for processing (3)
test/extended/node/README.mdtest/extended/node/node_e2e/netns_cleanup.gotest/extended/node/node_utils.go
🚧 Files skipped from review as they are similar to previous changes (2)
- test/extended/node/node_e2e/netns_cleanup.go
- test/extended/node/node_utils.go
07bfcca to
4cd2aca
Compare
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
4cd2aca to
1212a5f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/extended/node/node_e2e/netns_cleanup.go (1)
117-119: ⚡ Quick winConsider polling the on-node cleanup check.
NetNS teardown by kubelet/CRI-O can lag slightly behind removal of the pod object from the API. A single
CheckNetNsCleanedcall immediately after the pod isNotFoundcan flake; wrapping it in a short poll/Eventuallymakes the test resilient to that timing.♻️ Suggested poll
- g.By("Verify that the NetNS file has been cleaned up on the node") - err = nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath) - o.Expect(err).NotTo(o.HaveOccurred(), "NetNS file was not cleaned up") + g.By("Verify that the NetNS file has been cleaned up on the node") + err = wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) { + if cleanErr := nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath); cleanErr != nil { + e2e.Logf("NetNS not cleaned yet: %v", cleanErr) + return false, nil + } + return true, nil + }) + o.Expect(err).NotTo(o.HaveOccurred(), "NetNS file was not cleaned up")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/netns_cleanup.go` around lines 117 - 119, Wrap the single call to nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath) in a short poll using Gomega's o.Eventually (or wait.PollImmediate) so the test retries until the NetNS is actually gone; e.g., poll the function that calls nodeutils.CheckNetNsCleaned and assert the returned error is nil with a reasonable timeout (e.g., ~1m) and short interval (e.g., ~2-5s) before calling o.Expect(...).NotTo(o.HaveOccurred()), referencing the existing nodeutils.CheckNetNsCleaned, oc, nodeName and netNsPath symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/node/node_utils.go`:
- Around line 805-810: Capture the command output and error from
ExecOnNodeWithChroot (replace "_, err := ExecOnNodeWithChroot(...)" with "out,
err := ExecOnNodeWithChroot(...)"), then: if err == nil treat that as the file
still present and return an error; if err != nil inspect it—if it encodes an
exit status indicating "file missing" (exit code 1 or stderr like "No such file
or directory") then log success via framework.Logf and return nil, otherwise log
the full error and output with framework.Logf and return that error so real exec
failures are not masked; reference ExecOnNodeWithChroot, netNsPath, and
framework.Logf when making these changes.
---
Nitpick comments:
In `@test/extended/node/node_e2e/netns_cleanup.go`:
- Around line 117-119: Wrap the single call to nodeutils.CheckNetNsCleaned(oc,
nodeName, netNsPath) in a short poll using Gomega's o.Eventually (or
wait.PollImmediate) so the test retries until the NetNS is actually gone; e.g.,
poll the function that calls nodeutils.CheckNetNsCleaned and assert the returned
error is nil with a reasonable timeout (e.g., ~1m) and short interval (e.g.,
~2-5s) before calling o.Expect(...).NotTo(o.HaveOccurred()), referencing the
existing nodeutils.CheckNetNsCleaned, oc, nodeName and netNsPath symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a57532f7-ba8e-42e3-9094-557d1b7c7533
📒 Files selected for processing (3)
test/extended/node/README.mdtest/extended/node/node_e2e/netns_cleanup.gotest/extended/node/node_utils.go
✅ Files skipped from review due to trivial changes (1)
- test/extended/node/README.md
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/extended/node/node_e2e/netns_cleanup.go (1)
117-119: ⚡ Quick winConsider polling the on-node NetNS check to avoid flakiness.
CheckNetNsCleanedperforms a singletest -ewith no retry. Although the pod object reachingNotFoundusually implies sandbox/network teardown has completed, NetNS removal on a slow node can lag slightly behind API-server pod deletion, making this assertion intermittently flaky. Wrapping the check in a short poll makes the test more robust.♻️ Proposed polling wrapper
g.By("Verify that the NetNS file has been cleaned up on the node") - err = nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath) - o.Expect(err).NotTo(o.HaveOccurred(), "NetNS file was not cleaned up") + err = wait.PollUntilContextTimeout(ctx, 2*time.Second, 1*time.Minute, true, func(ctx context.Context) (bool, error) { + if cleanErr := nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath); cleanErr != nil { + e2e.Logf("NetNS not yet cleaned: %v", cleanErr) + return false, nil + } + return true, nil + }) + o.Expect(err).NotTo(o.HaveOccurred(), "NetNS file was not cleaned up")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/node/node_e2e/netns_cleanup.go` around lines 117 - 119, Replace the single one-off call to nodeutils.CheckNetNsCleaned with a short polling retry so the on-node test -e check is retried until success or timeout; e.g., use a polling helper (wait.PollImmediate or o.Eventually) to call nodeutils.CheckNetNsCleaned(oc, nodeName, netNsPath) repeatedly with a small interval (e.g., 500ms) and a reasonable timeout (e.g., 20–30s), and fail the test if the poll times out while still returning an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/extended/node/node_e2e/netns_cleanup.go`:
- Around line 117-119: Replace the single one-off call to
nodeutils.CheckNetNsCleaned with a short polling retry so the on-node test -e
check is retried until success or timeout; e.g., use a polling helper
(wait.PollImmediate or o.Eventually) to call nodeutils.CheckNetNsCleaned(oc,
nodeName, netNsPath) repeatedly with a small interval (e.g., 500ms) and a
reasonable timeout (e.g., 20–30s), and fail the test if the poll times out while
still returning an error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 39136102-75cb-4671-875e-402c4b5df910
📒 Files selected for processing (3)
test/extended/node/README.mdtest/extended/node/node_e2e/netns_cleanup.gotest/extended/node/node_utils.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/extended/node/node_utils.go
1212a5f to
bb67b91
Compare
|
@BhargaviGudi: No Jira issue with key OCP-56266 exists in the tracker at https://redhat.atlassian.net. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@BhargaviGudi: This pull request references OCPNODE-4560 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
|
/retest-required |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: bb67b91
New tests seen in this PR at sha: bb67b91
|
|
/test periodic-ci-openshift-origin-release-5.0-e2e-aws-ovn |
bb67b91 to
3e0ee2d
Compare
|
Scheduling required tests: |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 3e0ee2d
New tests seen in this PR at sha: 3e0ee2d
|
3e0ee2d to
6edab1a
Compare
|
Scheduling required tests: |
6edab1a to
89bd515
Compare
|
Scheduling required tests: |
|
@BhargaviGudi: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 89bd515
|
Migrates testcase OCP-56266 from openshift-tests-private to origin.
What this test validates
This test verifies that kubelet/CRI-O properly clean up the network namespace file when a pod is deleted.
The test:
Implementation details
node_utils.gofollow Ginkgo best practices (no assertions, return errors)framework.Logf()for logging instead of assertions in helpersSummary by CodeRabbit