OCPBUGS-62702: AA: latency-e2e: skip tests on HT-disabled systems #1386

SargunNarula · 2025-08-29T12:35:16Z

This PR addresses an issue with the BZ 2094046 test cases for oslat and cyclictest.

These tests were originally negative tests, expecting to fail on Hyperthreading enabled systems. However, on HT-disabled systems, the tests executed successfully and passed unexpectedly, leading to false positives.

Changes in this PR:

Added Hyperthreading detection in the test execution path.
Skip BZ 2094046 tests when HT is disabled, preventing false passes on systems without Hyperthreading.

Assisted-by: Cursor v1.24.2
AI Attribution: AIA HAb Ce Hin R Claude-4-sonnet v1.0

shajmakh

Thanks for catching and addressing this.
I left a few comments.
/approve

test/e2e/performanceprofile/functests/5_latency_testing/latency_testing.go

openshift-ci · 2025-09-25T15:40:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SargunNarula, shajmakh
Once this PR has been reviewed and has the lgtm label, please assign jmencak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shajmakh

the updates lgtm. regarding the commit (and PR title), I think it should highlight that this fix is derived from the fact that there might be different HT configurations: with HT enabled and without, rather than having the max latency missing or not.

mrniranjan · 2025-09-26T11:45:18Z

/lgtm

openshift-ci-robot · 2025-09-26T11:50:36Z

@SargunNarula: This pull request references CNF-18648 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.19" instead.

In response to this:

This PR addresses an issue with the BZ 2094046 test cases for oslat and cyclictest.

These tests were originally negative tests, expecting to fail on Hyperthreading enabled systems. However, on HT-disabled systems, the tests executed successfully and passed unexpectedly, leading to false positives.

Changes in this PR:

Added Hyperthreading detection in the test execution path.

Skip BZ 2094046 tests when HT is disabled, preventing false passes on systems without Hyperthreading.

Assisted-by: Cursor v1.24.2
AI Attribution: AIA HAb Ce Hin R Claude-4-sonnet v1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

shajmakh · 2025-09-26T11:52:06Z

LGTM
let's please confirm the tests pass on both configurations (HT enabled and disabled) before merging this

shajmakh · 2025-09-26T12:21:05Z

/hold

SargunNarula · 2025-09-29T22:46:31Z

/retest

ffromani

conceptually OK, but questions about the implementation

ffromani · 2025-09-30T11:09:29Z

test/e2e/performanceprofile/functests/5_latency_testing/latency_testing.go

+	workerNodes, err := nodes.GetByLabels(testutils.NodeSelectorLabels)
+	if err != nil {
+		return false, fmt.Errorf("get worker nodes: %w", err)
+	}
+	workerNode := &workerNodes[0]


why do we need to pick a random node which matches the labels? can't we just pick the node by name?

By specifying index 0, we fix the node among those that have the appropriate labels. To ensures that if a performance profile has applied any kernel argument, such as nosmt, we can verify it through an actual runtime check.

ok, but I still don't follow why we need to use `the node selector labels vs picking a specific node and checking that node

ffromani · 2025-09-30T11:10:41Z

test/e2e/performanceprofile/functests/5_latency_testing/latency_testing.go

+	}
+	cpuID := set.List()[0]
+
+	isHTEnabled := nodes.IsHyperthreadingEnabled(ctx, cpuID, workerNode)


I'd check the node settings (possibly /proc/cmdline) or actually any random CPU. To put it differently, why the first isolated CPU is significant and why is it better than, say, cpu#0 ?

There is no particular significance in choosing the first isolated CPU. An ID was simply required to perform the check, so selected one from the isolated set. Do you suggest checking any random cpu ?

ffromani · 2025-09-30T11:11:13Z

test/e2e/performanceprofile/functests/utils/nodes/nodes.go

+func IsHyperthreadingEnabled(ctx context.Context, cpuID int, node *corev1.Node) bool {
+	smtLevel := GetSMTLevel(ctx, cpuID, node)
+	return smtLevel > 1


I'd just inline GetSMTLevel in the one and only calling site

Resolved, with latest commit.

SargunNarula · 2025-10-01T12:08:39Z

/retest

The BZ 2094046 test cases for oslat and cyclictest were negative tests expecting to fail on HT-enabled systems, but they passed unexpectedly on HT-disabled systems because the tools executed successfully. Changes: - Add hyperthreading detection in its test execution path - Skip BZ 2094046 tests when HT is disabled to prevent false passes Signed-Off-by: Sargun Narula <[email protected]>

openshift-ci · 2025-10-02T16:55:47Z

@SargunNarula: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`41ddbcf`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

SargunNarula · 2025-10-03T11:26:41Z

LGTM
let's please confirm the tests pass on both configurations (HT enabled and disabled) before merging this

@shajmakh I can now confirm the tests pass on both HT enabled and disabled environments. More specifically pass on HT enabled and gets skipped on HT-disabled ones.

Note: Hyperthreading check was performed on a BM node considering more number of online CPUs needed as compared to VM node

SargunNarula · 2025-10-03T11:26:55Z

/verified by @SargunNarula

openshift-ci-robot · 2025-10-03T11:27:07Z

@SargunNarula: This PR has been marked as verified by @SargunNarula.

In response to this:

/verified by @SargunNarula

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-03T11:38:11Z

@SargunNarula: This pull request references Jira Issue OCPBUGS-62702, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.21.0) matches configured target version for branch (4.21.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mrniranjan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This PR addresses an issue with the BZ 2094046 test cases for oslat and cyclictest.

These tests were originally negative tests, expecting to fail on Hyperthreading enabled systems. However, on HT-disabled systems, the tests executed successfully and passed unexpectedly, leading to false positives.

Changes in this PR:

Added Hyperthreading detection in the test execution path.

Skip BZ 2094046 tests when HT is disabled, preventing false passes on systems without Hyperthreading.

Assisted-by: Cursor v1.24.2
AI Attribution: AIA HAb Ce Hin R Claude-4-sonnet v1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

shajmakh · 2025-10-03T11:40:15Z

/lgtm
Thanks for the updates, I'll leave room for oter reviewers if they still have comments
/hold

ffromani · 2025-10-03T12:03:42Z

test/e2e/performanceprofile/functests/5_latency_testing/latency_testing.go

+	workerNodes, err := nodes.GetByLabels(testutils.NodeSelectorLabels)
+	if err != nil {
+		return false, fmt.Errorf("get worker nodes: %w", err)
+	}
+	workerNode := &workerNodes[0]


ok, but I still don't follow why we need to use `the node selector labels vs picking a specific node and checking that node

ffromani · 2025-10-03T12:04:41Z

test/e2e/performanceprofile/functests/5_latency_testing/latency_testing.go

+	set, err := cpuset.Parse(string(*profile.Spec.CPU.Isolated))
+	if err != nil || set.Size() == 0 {
+		return false, fmt.Errorf("failed to parse isolated CPUs from profile")
+	}
+	cpuID := set.List()[0]


I still don't get why this code is better than just checking cpuID 0 (which is much simpler) or the kernel command line arguments (/proc/cmdline)

openshift-ci bot requested review from jmencak and swatisehgal August 29, 2025 12:36

SargunNarula force-pushed the latency_test branch 2 times, most recently from 6fbff19 to ab26087 Compare September 25, 2025 11:20

shajmakh reviewed Sep 25, 2025

View reviewed changes

SargunNarula force-pushed the latency_test branch from ab26087 to f77ebb7 Compare September 26, 2025 09:22

shajmakh reviewed Sep 26, 2025

View reviewed changes

SargunNarula changed the title ~~Fixed oslat & cyclictest failure due to missing max latency value~~ AA: e2e: CNF:18648 Fix BZ 2094046 oslat & cyclictest HT tests to prevent false passes on HT-disabled systems Sep 26, 2025

openshift-ci bot assigned mrniranjan Sep 26, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2025

SargunNarula changed the title ~~AA: e2e: CNF:18648 Fix BZ 2094046 oslat & cyclictest HT tests to prevent false passes on HT-disabled systems~~ CNF-18648: AA: latency-e2e: skip tests on HT-disabled systems Sep 26, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 26, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 26, 2025

ffromani reviewed Sep 30, 2025

View reviewed changes

SargunNarula force-pushed the latency_test branch from f77ebb7 to 28fb228 Compare September 30, 2025 12:16

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 30, 2025

SargunNarula force-pushed the latency_test branch from 28fb228 to 30c01e3 Compare September 30, 2025 12:41

SargunNarula force-pushed the latency_test branch from 30c01e3 to 41ddbcf Compare October 2, 2025 12:40

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 3, 2025

SargunNarula changed the title ~~CNF-18648: AA: latency-e2e: skip tests on HT-disabled systems~~ OCPBUGS-62702: AA: latency-e2e: skip tests on HT-disabled systems Oct 3, 2025

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 3, 2025

openshift-ci bot requested a review from mrniranjan October 3, 2025 11:38

openshift-ci bot assigned shajmakh Oct 3, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2025

ffromani reviewed Oct 3, 2025

View reviewed changes

OCPBUGS-62702: AA: latency-e2e: skip tests on HT-disabled systems #1386

Are you sure you want to change the base?

OCPBUGS-62702: AA: latency-e2e: skip tests on HT-disabled systems #1386

Uh oh!

Conversation

SargunNarula commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shajmakh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci bot commented Sep 25, 2025

Uh oh!

shajmakh left a comment

Choose a reason for hiding this comment

Uh oh!

mrniranjan commented Sep 26, 2025

Uh oh!

openshift-ci-robot commented Sep 26, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shajmakh commented Sep 26, 2025

Uh oh!

shajmakh commented Sep 26, 2025

Uh oh!

SargunNarula commented Sep 29, 2025

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SargunNarula Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SargunNarula commented Oct 1, 2025

Uh oh!

openshift-ci bot commented Oct 2, 2025

Uh oh!

SargunNarula commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SargunNarula commented Oct 3, 2025

Uh oh!

openshift-ci-robot commented Oct 3, 2025

Uh oh!

openshift-ci-robot commented Oct 3, 2025

Uh oh!

shajmakh commented Oct 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SargunNarula commented Aug 29, 2025 •

edited

Loading

openshift-ci-robot commented Sep 26, 2025 •

edited by openshift-ci bot

Loading

SargunNarula Sep 30, 2025 •

edited

Loading

SargunNarula commented Oct 3, 2025 •

edited

Loading