autosize control plane support for performance profile creator #1349

ffromani · 2025-06-26T13:06:14Z

Implement reserved cpu (aka infra+control plane) sizing using a the linear programming optimization (gonum/optimize).

The core idea is to model the constraints and let the optimization package compute the desired target.

These changes where AI-Assisted (hence the AA tag), then largely amended by a human (hence the HI tag - Human Intervention).

The initial penalty cost structure was suggested by google Gemini 2.5 flash, and then amended by human intervention.

openshift-ci · 2025-06-26T13:07:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani · 2025-07-05T07:52:27Z

/test e2e-no-cluster

ffromani · 2025-07-09T12:31:22Z

/cc @MarSik

MarSik · 2025-07-16T13:27:25Z

This is an interesting approach, I wonder if we can express constraints like allocate the whole (multiples) L3 CCDs for reserved or integrate the NIC queue count (no more than 16/32) and interrupt counts (224 per cpu).

ffromani · 2025-07-16T13:35:00Z

This is an interesting approach, I wonder if we can express constraints like allocate the whole (multiples) L3 CCDs for reserved or integrate the NIC queue count (no more than 16/32) and interrupt counts (224 per cpu).

We can introduce a quantitative limitation like I did for SMT, so we allocate in such a way to minimize the LLC count, for example. The problem however is that we only have the quantitative axis because kubelet owns the exact placement. We can say that, overall, 7 CPUs is better than 9 because we have all aligned and then we can make best use of compute resources (completely made up numbers, hope it is clear enough)

ffromani · 2025-07-16T13:35:59Z

pkg/performanceprofile/profilecreator/autosize/autosize.go

+
+// Assumptions:
+// 1. All the machines in the node pool have identical HW specs and need identical sizing.
+// 2. We cannot distinguyish betwee infra/OS CPU requirements and control plane CPU requirement.


typo: distinguish

MarSik · 2025-07-16T13:51:21Z

@ffromani Well the PPC can (and should) select the specific cpus for the reserved/isolated split based on the capacity computation. It has all the hardware topology information to be able to do that. Or are we talking about different aspects of PPC here?

Implement reserved cpu (aka infra+control plane) sizing using a the linear programming optimization (gonum/optimize). The core idea is to model the constraints and let the optimization package compute the desired target. These changes where AI-Assisted (hence the AA tag), then largely amended by a human (hence the HI tag - Human Intervention). The initial penalty cost structure was suggested by google Gemini 2.5 flash, and then amended by human intervention. Assisted-by: Google Gemini Assisted-by-model: gemini-2.5-flash Signed-off-by: Francesco Romani <[email protected]>

TODO explain why Signed-off-by: Francesco Romani <[email protected]>

consider the real SMT Level when doing autosize computations. Signed-off-by: Francesco Romani <[email protected]>

Signed-off-by: Francesco Romani <[email protected]>

openshift-ci · 2025-07-16T21:48:05Z

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/lint	`6005e98`	link	true	`/test lint`
ci/prow/e2e-aws-ovn-techpreview	`6005e98`	link	true	`/test e2e-aws-ovn-techpreview`
ci/prow/e2e-aws-ovn	`6005e98`	link	true	`/test e2e-aws-ovn`
ci/prow/okd-scos-e2e-aws-ovn	`6005e98`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ffromani · 2025-07-17T06:41:44Z

@ffromani Well the PPC can (and should) select the specific cpus for the reserved/isolated split based on the capacity computation. It has all the hardware topology information to be able to do that. Or are we talking about different aspects of PPC here?

Uhm, we can achieve that rethinking all the core allocation stage. I added an add-on step to demo the autosizing, but we are looking to a possible full rewrite of the allocation logic embedding some form of optimization, this is a much bigger endeavour. Do we want to go this direction?

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2025

openshift-ci bot requested review from Tal-or and jmencak June 26, 2025 13:07

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2025

ffromani force-pushed the perfprof-creator-autosize-sched-ctrlplane branch 4 times, most recently from 6955fc6 to 99f8d42 Compare July 4, 2025 11:21

ffromani force-pushed the perfprof-creator-autosize-sched-ctrlplane branch from 99f8d42 to 71242c5 Compare July 7, 2025 12:48

ffromani mentioned this pull request Jul 7, 2025

NO-JIRA: performance profile: internal cleanup #1359

Merged

ffromani force-pushed the perfprof-creator-autosize-sched-ctrlplane branch 2 times, most recently from be7eda2 to 6912796 Compare July 9, 2025 12:30

openshift-ci bot requested a review from MarSik July 9, 2025 12:31

ffromani changed the title ~~WIP: autosize control plane support for performance profile creator~~ autosize control plane support for performance profile creator Jul 16, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2025

ffromani commented Jul 16, 2025

View reviewed changes

ffromani added 6 commits July 16, 2025 19:18

WIP: SMT postprocessing

04da524

TODO explain why Signed-off-by: Francesco Romani <[email protected]>

autosize: handle SMT in autosizing

6d44922

consider the real SMT Level when doing autosize computations. Signed-off-by: Francesco Romani <[email protected]>

WIP

cc52c5f

Signed-off-by: Francesco Romani <[email protected]>

WIP: prevalidate and postvalidate

877818c

Signed-off-by: Francesco Romani <[email protected]>

WIP: minimal CPU to fit IRQ count

6005e98

Signed-off-by: Francesco Romani <[email protected]>

ffromani force-pushed the perfprof-creator-autosize-sched-ctrlplane branch from 6912796 to 6005e98 Compare July 16, 2025 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

autosize control plane support for performance profile creator #1349

autosize control plane support for performance profile creator #1349

Uh oh!

ffromani commented Jun 26, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Jun 26, 2025

Uh oh!

ffromani commented Jul 5, 2025

Uh oh!

ffromani commented Jul 9, 2025

Uh oh!

MarSik commented Jul 16, 2025

Uh oh!

ffromani commented Jul 16, 2025

Uh oh!

ffromani Jul 16, 2025

Uh oh!

MarSik commented Jul 16, 2025

Uh oh!

openshift-ci bot commented Jul 16, 2025

Uh oh!

ffromani commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

autosize control plane support for performance profile creator #1349

Are you sure you want to change the base?

autosize control plane support for performance profile creator #1349

Uh oh!

Conversation

ffromani commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jun 26, 2025

Uh oh!

ffromani commented Jul 5, 2025

Uh oh!

ffromani commented Jul 9, 2025

Uh oh!

MarSik commented Jul 16, 2025

Uh oh!

ffromani commented Jul 16, 2025

Uh oh!

ffromani Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

MarSik commented Jul 16, 2025

Uh oh!

openshift-ci bot commented Jul 16, 2025

Uh oh!

ffromani commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ffromani commented Jun 26, 2025 •

edited

Loading