-
Notifications
You must be signed in to change notification settings - Fork 461
Fix the issue Iluvatar device scheduling policy binpack and spread are reversed #1596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: lixd <[email protected]>
…t-HAMi#1138) * fix: override node socre failure for kunlun Project-HAMi#1137 Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * Merge branch 'master' into bug-fix Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: update Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> --------- Signed-off-by: ouyangluwei(riseunion) <[email protected]> Co-authored-by: ouyangluwei(riseunion) <[email protected]>
Signed-off-by: Jifei Wang <[email protected]>
Signed-off-by: ouyangluwei(riseunion) <[email protected]> Co-authored-by: ouyangluwei(riseunion) <[email protected]>
Signed-off-by: antvirf <[email protected]>
Signed-off-by: calvin chen <[email protected]>
…roject-HAMi#1041) Signed-off-by: ghostloda <[email protected]>
* fix: Multi-node scoring nodes are inaccurate Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> * fix: ut Signed-off-by: ouyangluwei(riseunion) <[email protected]> --------- Signed-off-by: ouyangluwei(riseunion) <[email protected]> Co-authored-by: ouyangluwei(riseunion) <[email protected]>
Signed-off-by: ouyangluwei(riseunion) <[email protected]> Co-authored-by: ouyangluwei(riseunion) <[email protected]>
…scheduler roles + a namespace-scoped role for leader election Signed-off-by: antvirf <[email protected]>
Signed-off-by: antvirf <[email protected]>
…-in-release-changelog feat: Add new labels in .github/release.yml
…perms feat(scheduler-role): use a scoped-down role for scheduler
Signed-off-by: Jifei wang <[email protected]> update vendor
…oject-HAMi#1161) Bumps [aquasecurity/trivy-action](https://github.com/aquasecurity/trivy-action) from 0.31.0 to 0.32.0. - [Release notes](https://github.com/aquasecurity/trivy-action/releases) - [Commits](aquasecurity/trivy-action@0.31.0...0.32.0) --- updated-dependencies: - dependency-name: aquasecurity/trivy-action dependency-version: 0.32.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat(helm): optionally disable admission webhook This simplifies the deployment considerably and makes HAMi less intrusive inclusters where only a minority of workloads actually require GPU scheduling. Signed-off-by: antvirf <[email protected]> * fix(chart.yaml): keep original version from master Signed-off-by: antvirf <[email protected]> * docs(helm): comment to explain impact of disabling admissionWebhook Signed-off-by: antvirf <[email protected]> --------- Signed-off-by: antvirf <[email protected]>
Signed-off-by: Yunlu Wen <[email protected]> Co-authored-by: Yunlu Wen <[email protected]>
Fix e2e CI Signed-off-by: limengxuan <[email protected]>
Signed-off-by: Shouren Yang <[email protected]>
Signed-off-by: Shouren Yang <[email protected]>
Signed-off-by: Shouren Yang <[email protected]>
Signed-off-by: Shouren Yang <[email protected]>
…5.0 to 1.17.8 (Project-HAMi#1170) Signed-off-by: Shouren Yang <[email protected]>
…t-HAMi#1183) Signed-off-by: Shouren Yang <[email protected]>
…Mi#1189) Signed-off-by: Shouren Yang <[email protected]>
Signed-off-by: Jifei Wang <[email protected]>
…vidia Add basic test cases for enflame,hygon, metax, mthreads, nvidia module to verify Fit function. Includes positive and negative test scenarios. Signed-off-by: wangmin <[email protected]> Co-authored-by: wangmin <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Jifei Wang <[email protected]>
…HAMi#1092) Signed-off-by: Shouren Yang <[email protected]>
Signed-off-by: james <[email protected]>
Signed-off-by: qinxiaowen <[email protected]>
…here are multiple containers in one pod (Project-HAMi#1579) * fix: fix resource quota Signed-off-by: james <[email protected]> * fix: fix test case Signed-off-by: james <[email protected]> --------- Signed-off-by: james <[email protected]>
Signed-off-by: Jifei Wang <[email protected]>
* add modernize check Signed-off-by: dongjiang1989 <[email protected]> * add unittest case Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]>
…oject-HAMi#1584) Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.27.3 to 2.27.5. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.3...v2.27.5) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.27.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…Mi#1586) Bumps [golang.org/x/term](https://github.com/golang/term) from 0.38.0 to 0.39.0. - [Commits](golang/term@v0.38.0...v0.39.0) --- updated-dependencies: - dependency-name: golang.org/x/term dependency-version: 0.39.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* update Signed-off-by: limengxuan <[email protected]> * update Signed-off-by: limengxuan <[email protected]> * update chart for CDI Signed-off-by: limengxuan <[email protected]> * update docs Signed-off-by: limengxuan <[email protected]> * Update docs/config_cn.md Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update docs/config.md Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * update documents Signed-off-by: limengxuan <[email protected]> * fix kunlunxin vxpu issue Signed-off-by: limengxuan <[email protected]> * Update pkg/device/kunlun/vdevice.go Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * update Signed-off-by: limengxuan <[email protected]> * update Makefile for helm package Signed-off-by: limengxuan <[email protected]> * update discord invitation Signed-off-by: limengxuan <[email protected]> * Update README.md Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update README_cn.md Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update README_ja.md Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Signed-off-by: limengxuan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…i#1587) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.48.0 to 0.49.0. - [Commits](golang/net@v0.48.0...v0.49.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.49.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ct-HAMi#1585) Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.38.3 to 1.39.0. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.38.3...v1.39.0) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-version: 1.39.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ject-HAMi#1592) Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.3 to 1.9.4. - [Release notes](https://github.com/sirupsen/logrus/releases) - [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md) - [Commits](sirupsen/logrus@v1.9.3...v1.9.4) --- updated-dependencies: - dependency-name: github.com/sirupsen/logrus dependency-version: 1.9.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…AMi#1588) Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.40.0 to 0.41.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](golang/tools@v0.40.0...v0.41.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-version: 0.41.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: james <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: qiangwei1983 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @qiangwei1983, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a bug in the Iluvatar device scheduling mechanism where the binpack and spread policies were effectively reversed. The fix involves a targeted modification to the device iteration order within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue where the Iluvatar device scheduling policies for binpack and spread were reversed. The fix involves reversing the loop iteration order in the Fit function, which is a correct approach to solve the described problem. However, the unit tests have not been updated to account for this change in behavior, which will cause them to fail. I have left a comment detailing the required changes in the test file.
| tmpDevs = make(map[string]device.ContainerDevices) | ||
| reason := make(map[string]int) | ||
| for i := range devices { | ||
| for i := len(devices) - 1; i >= 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change correctly reverses the iteration order to fix the scheduling policy issue. However, it appears the unit tests have not been updated to reflect this change.
Specifically, Test_Fit in pkg/device/iluvatar/device_test.go has a "fit success" test case that provides two identical devices, dev-0 and dev-1, and expects dev-0 to be selected. With this change, the loop will iterate in reverse, so dev-1 will be selected first, causing the test to fail.
Please update the test case to expect dev-1 to be selected to ensure the tests pass and correctly reflect the new behavior.
|
please update the unit_test to pass the CI |
…e reversed Signed-off-by: 魏强 <[email protected]>
| tmpDevs = make(map[string]device.ContainerDevices) | ||
| reason := make(map[string]int) | ||
| for i := range devices { | ||
| for i := len(devices) - 1; i >= 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiangwei1983 I think it it better to use slices.Backward for traversing in reverse order with descending indices
|
@ouyangluwei163 do you recall why we iterate iluvatar device in forward order? please reply here if you do |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fix the issue Iluvatar device scheduling policy binpack and spread are reversed
Which issue(s) this PR fixes:
Fixes #1589