-
Notifications
You must be signed in to change notification settings - Fork 117
feat(gpu): implement gpu plugins #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JustinChengLZ
wants to merge
53
commits into
kubewharf:main
Choose a base branch
from
JustinChengLZ:dev/support-gpu-plugins
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat(gpu): implement gpu plugins #1008
JustinChengLZ
wants to merge
53
commits into
kubewharf:main
from
JustinChengLZ:dev/support-gpu-plugins
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1008 +/- ##
==========================================
+ Coverage 60.48% 60.62% +0.14%
==========================================
Files 698 735 +37
Lines 66360 69182 +2822
==========================================
+ Hits 40138 41942 +1804
- Misses 21677 22508 +831
- Partials 4545 4732 +187
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e57c066 to
3a1552c
Compare
4e39a97 to
969a658
Compare
eac1657 to
95fc12f
Compare
The nil device request check was moved to after qos validation to ensure proper resource allocation sequence. This change maintains the same behavior but improves the logical flow of the allocation process.
remove redundant gpuCount and gpuNames logging in GetTopologyHints and Allocate move GetGPUCount call after initial checks in GetTopologyHints use deviceReq.DeviceRequest instead of gpuCount for memory calculation
filter out unhealthy gpu devices when calculating topology hints skip numa binding hints for non-numa binding requests
Set allocatable memory to zero for unhealthy GPU devices and use separate capacity values instead of reusing allocatable values. This ensures accurate resource accounting for both healthy and unhealthy devices.
Clean up GenericQRMPluginConfiguration by removing unused StateFileDirectory and InMemoryStateFileDirectory fields to simplify the struct.
Return nil instead of error when numa topology is not ready and log the error fix: handle error gracefully fix: handle error gracefully
- Add DeviceName field to AllocationInfo struct to track GPU device names - Implement GetQuantityAllocatedWithFilter to support filtered allocation queries - Modify GPU memory plugin to consider device names during allocation - Remove NUMA binding check and use device name filtering instead
9221413 to
f72aa18
Compare
6b4272b to
eb87c70
Compare
eb87c70 to
33b37df
Compare
workaround for fixing syncNics failure Signed-off-by: 张浩宇 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer: